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Executive Summary 



School choice remains an important part of the national discussion on education reform 
strategies and their benefits. While a variety of policies encourage parents’ selection of schools for their 
children — for example, charter schools, magnet schools, and district open enrollment — scholarships that 
allow students to attend a private school have received the most attention. The U.S. Congress’ passage of 
the District of Columbia School Choice Incentive Act of 2003 in January 2004 provided a unique 
opportunity not only to implement a system of private school choice for low-income students in the 
District, hut also to rigorously assess the effects of the Program on students, parents, and the existing 
school system. This report describes the first- year impacts of the Program on those who applied for and 
were given the option to move from a public school to a participating private school of their choice. 



The DC Opportunity Scholarship Program 

The 2004 statute established what is now called the DC Opportunity Scholarship Program 
(OSP) — the first Federal government initiative to provide K-12 education scholarships to families to send 
their children to private schools. The OSP has the following programmatic elements: 



• To be eligible, students entering grades K-12 must reside in the District and have a 
family income at or below 185 percent of the Federal poverty line. 

• Participating students receive scholarships of up to $7,500 to cover the costs of tuition, 
school fees, and transportation to a participating private school. 

• Scholarships are renewable for up to 5 years (as funds are appropriated), as long as 
students remain eligible for the Program. 

• In a given year, if there are more eligible applicants than available scholarships or open 
slots in private schools, scholarships are awarded by lottery. 

• In making scholarship awards, priority is given to students attending public schools 
designated as in need of improvement (SINI) under the No Child Left Behind (NCLB) 
Act and to families that lack the resources to take advantage of school choice options. 

• Private schools participating in the Program must be located in the District of 
Columbia and must agree to requirements regarding nondiscrimination in admissions, 
fiscal accountability, and cooperation with the evaluation. 
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The Washington Scholarship Fund (WSF), a 501(c)3 organization in the District of 
Columbia, was selected hy the U.S. Department of Education (ED) through a competition to operate the 
Program. To date, there have been three rounds of applicants to the OSP (table ES-1). Flowever, this 
report, and the mandated evaluation of the Program, draws only on eligible applicants in spring 2004 and 
in spring 2005 (cohorts 1 and 2) and, in particular, focuses on public school applicants whose award of a 
scholarship was determined by lottery. Descriptive reports on each of the first 2 years of implementation 
and cohorts of students have been previously prepared and released (Wolf, Gutmann, Eissa, Puma, and 
Silverberg, 2005; Wolf, Gutmann, Puma, and Silverberg, 2006).' With the recent addition of a much 
smaller third cohort of participants, as of fall of 2006, exactly 1,800 students were using Opportunity 
Scholarships. 



Table ES-1. OSP Applicants by Program Status, Cohorts 1, 2, and 3 





Cohort 1 Cohort 2 

(Spring 2004) (Spring 2005) 


Total 

Cohort 1 and 
Cohort 2 


Cohort 3 
(Spring 2006) 


Total, All 
Cohorts 


Applicants 


2,692 


3,126 


5,818 


576 


6,394 


Eligible applicants 


1,848 


2,199 


4,047 


396 


4,443 


Scholarship awardees 


1,366 


1,088 


2,454 


396 


2,850 


Scholarship users in initial year of receipt 


1,027 


797 


1,824 


328 


2,152 


Scholarship users fall 2005 


919 


797 


1,716 


NA 


1,716 


Scholarship users fall 2006 


788 


684 


1,472 


328 


1,800 



NOTES: Because most participating private schools closed their enrollments hy mid-spring, applicants generally had their 
eligibility determined based on income and residency, and the lotteries were held prior to the administration of baseline 
tests. Therefore, baseline testing was not a condition of eligibility for most applicants. The exception was applicants 
entering the highly oversubscribed grades 6-12 in cohort 2. Those who did not participate in baseline testing were 
deemed ineligible for the lottery and were not included in the eligible applicant figure presented above, though they 
were counted in the applicant total. In other words, the cohort 2 applicants in grades 6-12 had to satisfy income, 
residency, and baseline testing requirements before they were designated eligible applicants and entered into the 
lottery. 

The initial yeai" of scholarship receipt is fall 2004 for cohort 1, fall 2005 for cohort 2, and fall 2006 for cohort 3. 

SOURCES: The DC Opportunity Scholarship Program applications and the Program operator’s files. 



The Mandated Evaluation 

In addition to establishing the DC Opportunity Scholarship Program, Congress required an 
independent evaluation that uses “. . . the strongest possible research design for determining the 
effectiveness” of the Program. The Department of Education’s Institute of Education Sciences (lES), 
responsible for the mandated evaluation, determined that the foundation of the evaluation would be a 
randomized controlled trial (RCT) that compares outcomes of eligible public school applicants (students 



' Both of these reports are available on the Institute of Education Sciences’ Web site at: http://www.ies. ed.gov/ncee . 
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and their parents) randomly assigned to receive or not receive a scholarship. An RCT design is widely 
viewed as the best method for identifying the independent effect of programs on subsequent outcomes 
and has been used by researchers conducting impact evaluations of privately funded scholarship programs 
in Charlotte, North Carolina; Dayton, Ohio; New York City; and Washington, DC. ^ 

The RCT design for the OSP evaluation required more applications than scholarships or slots 
available in private schools, what we call “oversubscription,” to permit the random assignment of 
scholarships through lotteries. However, not all OSP applicants faced conditions for a lottery. The pool of 
eligible public school applicants in oversubscribed grades included 492 applicants in cohort 1 (spring 
2004) and 1,816 applicants in cohort 2 (spring 2005). Of those 2,308 eligible public school applicants 
who entered lotteries, 1,387 were randomly assigned to receive a scholarship (the “treatment” condition), 
and 921 were randomly assigned to not receive a scholarship (the “control” condition). The lotteries that 
generated these assignments took into account the statutory priorities, such that students from SINI 
schools had the highest probability within their grade bands of being awarded a scholarship, and students 
from other public schools had a lower probability of being awarded a scholarship. The OSP impact 
sample group includes the randomly assigned members of the treatment and control groups and comprises 
57 percent of all eligible applicants in the first 2 years of Program operation. ^ 



Characteristics of Students in the Impact Sampie 



Students in the impact sample were either rising kindergartners or attending DC public 
schools in the year they applied for the OSP. The characteristics of the impact sample students when they 
applied reflect the Program’s income eligibility criteria and priorities as specified in the authorizing 
legislation: 



• Their average household at the time of application had almost three children supported 
by an annual income of $17,356. 



^ RCTs are commonly referred to as the “gold standard” for evaluating educational interventions; when mere chance determines 
which eligible applicants receive access to school choice, the students who apply but are not admitted make up an ideal 
“control group” for comparison with the school choice “treatment group.” See chapter 3 for more detail on the RCT design and 
analysis. 

^ Students who were already attending a private school when they applied to the OSP are not included in the impact sample, 
although a lottery was held for those applicants in cohort 1 . Also not included in the impact sample are the 85 1 students who 
applied in cohort 1 to enter grades K-5, all of whom received scholarships without a lottery because there were more private 
school slots than applicants at that grade level. 
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Although 80 percent of their mothers reported having a high school diploma, only 6 
percent said they had a bachelor’s degree; 58 percent of the mothers reported working 
full time. 



• Nearly 90 percent were identified hy their parents as African American, and 9 percent 
were identified as being of Hispanic ethnicity. 

• Twelve percent were described by their parents as having special needs. 

• They are evenly divided between males and females. 

• About 44 percent of the impact sample was attending public schools designated SINI 
between 2003 and 2005. 

• The average impact sample student at the time of application had a reading scale score 
of 608 and a math scale score of 588, which equate to the 33'^‘* National Percentile 
Rank (NPR) in reading and the 3H‘ NPR in math. 

After 1 year, 77 percent of the students awarded a scholarship were attending a participating 
private school. Fifteen percent of the students who were not awarded a scholarship were nevertheless 
enrolled in a private school. As has been true in other scholarship programs, not all treatment group 
students offered scholarships choose to attend a private school, and some students in the control group 
find their way into private schools even without a Program scholarship. 

Impact sample students who used their OSP scholarship were enrolled in 47 of the 68 
participating private schools and were clustered in those schools that offered the most slots to OSP 
students. Of the students in this group, 8.4 percent were attending a school charging tuition above the 
statutory cap of $7,500 in their first year in the Program, even though 39 percent of all participating 
schools charged tuitions above the cap at that time. The average tuition charged at the schools that these 
scholarship students attended was $5,253 but varied between $3,400 and $24,545."^ The average OSP 
student in this group attended a school with 177 students — somewhat smaller than the average of 236 
students across the full set of participating schools. These OSP students are concentrated in the 
participating private schools with higher minority enrollments but with student/teacher ratios that are 
approximately representative of the entire set of OSP schools. Nearly two-thirds of these OSP students are 
attending participating schools operated by the Catholic Archdiocese of Washington. 

In interpreting the presence or absence of Program impacts, it is important to understand the 
difference between the treatment and control groups in their educational environments and experiences. 



4 



The WSF reported that families were not required to pay for tuition out-of-pocket in almost all cases where the tuition charged 
by the school exceeded the $7,500 cap. 
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Examining the characteristics of the schools attended hy students in the treatment and control groups 
suggests 

• There were no significant differences between treatment and control students in the 
characteristics of the public schools they attended at the time of application. 

• One year later, a similar proportion of students in the treatment and control groups 
were attending schools that offered libraries, gyms, special programs for advanced 
learners, individual tutors, art programs, and after-school programs. 

• One year later, students in the treatment group were more likely than those in the 
control group to have a computer lab or music program available to them at school. 
The treatment group was less likely to have access at school to a cafeteria, nurse’s 
office, counselors, or special programs for either non-English speakers or students with 
learning problems. 



The Impact of the Program After 1 Year 

The statute that authorized the OSP mandated that the Program be evaluated with regard to 
its impact on student test scores and safety, as well as the “success” of the Program, which we interpret to 
include satisfaction with school choices. So far, the analysis can only estimate the effects of the Program 
on these outcomes 1 year after families and students applied to the OSP, or approximately 7 months after 
the start of students’ first school year in the Program. 



Impact of Being Awarded a Scholarship (Experimental Estimates) 

To estimate the extent to which the Program has an effect on participants, the study first 
compares the outcomes of the two experimental groups created through random assignment, called the 
“intent-to-treat” (ITT) approach. The only completely randomized and therefore strictly comparable 
groups in the study are those students whom the lottery determined were offered scholarships (the 
treatment group) and those who were not offered scholarships (the control group). The random 
assignment of students into treatment and control groups should, and did here, produce groups that are 
similar in key characteristics, both those we can observe and measure (e.g., family income, prior 
academic achievement) and those we cannot (e.g., motivation to succeed or benefit from the Program). A 
comparison of these two groups is the most robust and reliable measure of Program impacts because it 
requires the fewest assumptions to make the groups similar except for their participation in the Program. 
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The impact analysis proceeded in four steps: 



1. The impacts of the program on each outcome of interest were estimated for the entire 
sample of study participants, using an analytic model and well-estahlished statistical 
approaches that were specified in advance. 

2. Those same impacts were estimated for various policy-relevant subgroups of 
participants that differed based on the “need of improvement” status of their school 
(SINI), their baseline academic performance, their gender, their schooling level, and 
their cohort status. 

3. A reliability test was administered to the results drawn from multiple comparisons of 
treatment and control group members (e.g., across 10 different subgroups) to identify 
any statistically significant findings that could be due to chance, or what statisticians 
refer to as “false discoveries.” 

4. The results were subjected to sensitivity tests that involved re-estimating the impacts 
using three alternative analytic approaches. 

The findings discussed below are robust to adjustments for multiple comparisons and sensitivity tests 
unless specified. 



The analysis suggests the following findings regarding the impacts of a scholarship offer 

(table ES-2): 

• The main models indicate that the Program generated no statistically significant 
impacts, positive or negative, on student reading or math achievement for the entire 
impact sample in year 1 . One of the three alternative specifications indicated a positive 
and statistically significant math impact of 3.4 scale score points. 

• No statistically significant achievement impacts were observed for the high-priority 
subgroup of students who had attended a SINI public school under NCLB before 
applying to the Program. 

• The Program may have had an impact on math achievement for two subgroups of 
students with baseline characteristics associated with better academic preparation. The 
main models suggest that the OSP improved the math achievement of participating 
students who had not attended a SINI school by 4.7 scale score points and increased 
the math scores of those with relatively higher test score performance at baseline by 
4.3 scale score points. However, these findings should be interpreted with caution, as 
adjustments for multiple comparisons suggested they may be false discoveries. 

• No significant achievement impacts were observed for other subgroups of participating 
students, including those with lower test scores at baseline, girls, boys, elementary 
students, secondary students, or students within each of the individual cohorts that in 
combination made up the impact sample. 
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Table ES-2. Year 1 Test Score Differential ITT Regression-Based Impact Estimates 



Reading 



Student Achievement 


Treatment 
Group Mean 


Control 
Group Mean 


Difference 

(Estimated 

Impact) 


Effect Size 


p-value 


Full sample 


606.20 


605.18 


1.03 


.03 


.56 


Subgroups: 


SINI ever 


625.50 


625.74 


-.24 


-.01 


.92 


SINI never 


592.14 


590.10 


2.04 


.05 


.45 


Difference 


33.62 


35.64 


-2.27 


-.06 


.54 


Lower performance 


580.89 


582.48 


-1.59 


-.05 


.65 


Higher performance 


617.19 


614.75 


2.44 


.07 


.25 


Difference 


-36.30 


-32.27 


-4.03 


-.11 


.34 


Male 


607.08 


605.51 


1.56 


.04 


.55 


Female 


605.40 


604.88 


.52 


.01 


.84 


Difference 


1.68 


.64 


1.05 


.03 


.78 


K-8 


590.80 


589.30 


1.50 


.04 


.45 


9-12 


676.23 


677.33 


-1.10 


-.04 


.73 


Difference 


-85.44 


-88.03 


2.60 


.07 


.49 


Cohort 2 


591.77 


592.15 


-.38 


-.01 


.85 


Cohort 1 


659.13 


653.03 


6.10 


.20 


.11 


Difference 


-67.36 


-60.88 


-6.48 


-.18 


.14 









Math 






Student Achievement 


Treatment 
Group Mean 


Control 
Group Mean 


Difference 

(Estimated 

Impact) 


Effect Size 


p-value 


Eull sample 


595.61 


592.87 


2.74 


.08 


.07 


Subgroups: 


SINI ever 


568.30 


568.10 


.20 


.01 


.93 


SINI never 


615.57 


610.89 


4.68* 


.12 


.04 


Difference 


-47.27 


-42.79 


-4.48 


-.13 


.17 


Lower performance 


576.07 


576.72 


-.66 


-.02 


.81 


Higher performance 


603.95 


599.66 


4.30* 


.12 


.03 


Difference 


-27.88 


-22.93 


-4.95 


-.14 


.16 


Male 


595.89 


594.61 


1.27 


.04 


.57 


Eemale 


595.43 


591.25 


4.18 


.12 


.06 


Difference 


.46 


3.36 


-2.90 


-.08 


.38 


K-8 


577.63 


574.86 


2.77 


.07 


.11 


9-12 


677.27 


674.67 


2.60 


.10 


.43 


Difference 


-99.64 


-99.81 


.17 


.00 


.96 


Cohort 2 


579.35 


576.16 


3.19 


.09 


.07 


Cohort 1 


655.32 


654.22 


1.10 


.04 


.74 


Difference 


-75.97 


-78.06 


2.09 


.06 


.58 



* Statistically significant at the 95 percent confidence level. 



NOTES: Means are regression-adjusted using a consistent set of baseline covariates. Impacts are displayed in terms of scale 
scores and effect sizes in terms of standard deviations. Valid for reading = 1,649; math = 1,715. Separate reading and 
math sample weights used. 
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• The Program had a substantial positive impact on parents’ views of school safety hut 
not on students’ actual school experiences with dangerous activities. Parents in the 
treatment group perceived their child’s school to he less dangerous (an impact of -0.74 
on a 10-point scale) than parents in the control group. Student reports of dangerous 
incidents in school did not differ systematically between the treatment and control 
groups. 

• The Program also had an impact on parent satisfaction with their child’s school. For 
example, an additional 19 percent of the parents of students in the treatment group 
graded their child’s school “A” or “B” compared with the parents of control group 
students. 

• For the most part, student satisfaction with their school was unaffected by the Program. 
The main exception was for students with lower test score performance at baseline, 
who on average assigned their schools significantly lower grades if they were in the 
treatment group. 



Additional Findings Regarding Using a Scholarship and Attending a Private School (Non- 
experimental) 



The results described above answer the question “what happened to OSP applicants who 
were offered a scholarship, whether or not a student used the scholarship to attend a private school?” 
Estimating the impact of using an OSP scholarship involves statistically adjusting the initial impact 
results to account for two groups of impact sample students: (1) the about 20 percent who received but 
failed to take up the scholarship offer, who presumably had zero impact from the Program, and (2) an 
estimated 4 percent in the control group who never received a scholarship offer but who, by virtue of 
having a sibling with an OSP scholarship, wound up in a participating private school (what we call 
“program-induced crossover”). These straightforward statistical adjustments yield what are typically 
called the “impact-on-the-treated” or lOT results. These adjustments increase the size of the scholarship 
offer effect estimate, but cannot make a statistically insignificant result significant. Therefore, the 
adjustments are only applied to results that were statistically significant at the scholarship offer stage of 
the analysis. 



The statistically significant findings regarding the use of a scholarship include: 



• Using a scholarship led to positive impacts on math scores for students from non-SINI 
schools (6.1 scale score points compared to 4.7 scale score points for the impact of 
scholarship award) and for students with higher test scores at baseline (5.6 scale score 
points compared to 4.3 scale score points for the impact of scholarship award). 
However, adjustments for multiple comparisons indicate both of these findings may be 
false discoveries. 
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• Scholarship use led to an average reduction of nearly a point on the 10-point danger 
perception index for parents, compared to a -0.74 point impact for the award of a 
scholarship. 

• Using a scholarship significantly increased parent satisfaction with their child’s school. 
An additional 25 percent of the parents of scholarship users graded their child’s school 
“A” or “B” compared to the parents of control group students, while the difference was 
19 percent for the impact of the offer of a scholarship. 

Estimating the effect of attending a private school, regardless of whether an OSP scholarship 
was used, also begins with the original impact results hut uses a more complex statistical procedure.® 
Because this approach deviates somewhat from the overall experimental design of the evaluation, and 
yields estimates that are less precise, the private schooling results should he interpreted and used with 
caution. Like those applied to estimate the impact of OSP scholarship use, the private schooling 
adjustments increase the size of the scholarship offer effect estimate, hut cannot make an insignificant 
result significant. Therefore, the procedure is only applied to results that were statistically significant at 
the scholarship offer stage of the analysis. 



The main private schooling results suggest that 



• Private schooling was associated with higher math achievement for SINI-never 
students (hy 7.8 scale score points) and for students with higher test scores at baseline 
(by 6.7 scale score points), but both of these findings may be false discoveries due to 
multiple comparisons. These private schooling differences were larger than were the 
impacts of scholarship award and scholarship use for SINI-never students (4.7 points 
and 6.1 points, respectively) and for students with higher test scores at baseline (4.3 
points and 5.6 points, respectively). 

• Private schooling is associated with lower parent perceptions of danger and higher 
parent satisfaction. The average score for private school parents represented 1.14 
fewer points or areas of concern on the 10-point school danger index than the average 
score for public school parents, compared to impacts of -0.74 points for scholarship 
award and a reduction of one point for scholarship use. Similarly, parents of private 
school students were 30 percent more likely to grade their child’s school an “A” or “B” 
than were parents of public school students, compared to impact on this measure of 19 
percent for scholarship award and 25 percent for scholarship use. 



^ The scholarship lottery is used as an instrumental variable (IV) to predict whether a student attended private school. Unlike an 
indicator variable for actual attendance at a private school, the prediction of private school attendance using the scholarship 
lottery instrument is unbiased because it is the same for all treatment group students (and all control group students) regardless 
of their individual enrollment decisions. 
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These results can be placed in the context of other RCTs of scholarship programs for low- 
income students, which suggest no consistent pattern of academic achievement impacts for the first year 
of program participation. Among such evaluations of four privately funded scholarship programs, one 
study of the Charlotte, North Carolina, program clearly found statistically significant overall impacts on 
math and reading for the first year, while one of three analyses of the New York City program found 
overall impacts on math achievement (Barnard, Frangakis, Hill, and Rubin, 2003; Greene 2000). When 
African-Americans are considered separately, a group that makes up nearly 90 percent of the OSP impact 
study sample, two of three analyses of the New York City program suggest there were achievement gains 
in math for African-American students in some grade levels (Mayer, Peterson, Myers, Tuttle, and Howell, 
2002), but studies of the Dayton, Ohio, and earlier District of Columbia programs found no impacts for 
this group until students were in the program for 2 years (Howell, Wolf, Campbell, and Peterson, 2002). 
In contrast, all of the randomized controlled trials that measured parent satisfaction and perceptions of 
school safety found positive impacts similar to those demonstrated by the OSP the first year (Greene, 
2000; Howell and Peterson et ah, 2002). 

The findings here are based on information collected only a year after students applied to the 
Program and may not reflect the consistent impacts of the OSP over a longer period of time. Families that 
apply to voucher programs intend for their children to leave their current public schools and, in the case 
of the OSP, a much higher share of students in the treatment group (91.3 percent) switched schools — 
mostly from public to private — compared to those in the control group (56.6 percent). The first-year 
results, therefore, provide an early look at student experiences in what was a transitional year for most of 
them. Future reports will examine impacts 2 and 3 years after application to the Program, when any short- 
term effect of students’ transition to new schools may have dissipated. The later reports will also consider 
additional outcome measures, assess the extent to which school characteristics are associated with 
impacts, and examine how the DC public school system is changing in response to the Program. 
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1. Introduction 



School choice remains an important part of the national discussion on education reform 
strategies and their benefits. While a variety of policies encourage parents’ selection of schools for their 
children — for example, charter schools, magnet schools, and district open enrollment — scholarships that 
allow students to attend a private school have received the most attention. The U.S. Congress’ passage of 
the District of Columbia School Choice Incentive Act of 2003’ in January 2004 provided a unique 
opportunity not only to implement a system of private school choice for low-income students in the 
District, hut also to rigorously assess the effects of the Program on students, parents, and the existing 
school system. This report describes the first- year impacts of the Program on those who applied for and 
were given the option to move from a public school to a participating private school of their choice. 



1.1 The DC Opportunity Scholarship Program 



The 2004 statute established what is now called the DC Opportunity Scholarship Program 
(OSP) — the first Federal government initiative to provide K-12 education scholarships to families to send 
their children to private schools. The OSP has the following programmatic elements: 



• To be eligible, students entering grades K-12 must reside in the District and have a 
family income at or below 185 percent of the Federal poverty line. 

• Participating students receive scholarships of up to $7,500 to cover the costs of tuition, 
school fees, and transportation to a participating private school. 

• Scholarships are renewable for up to 5 years (as funds are appropriated), so long as 
students remain eligible for the Program and remain in good academic standing at the 
private school they are attending. 

• In a given year, if there are more eligible applicants than available scholarships or open 
slots in private schools, applicants are to be awarded scholarships by random selection 
(e.g., by lottery). 

• In making scholarship awards, priority is given to students attending public schools 
designated as in need of improvement (SINI) under the No Child Left Behind (NCLB) 
Act and to families that lack the resources to take advantage of school choice options. 



' Title III of Division C of the Consolidated Appropriations Act, 2004, P.L. 108-199. 
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• Private schools participating in the Program must he located in the District of 
Columbia and must agree to requirements regarding nondiscrimination in admissions, 
fiscal accountahility, and cooperation with the evaluation. 

Following passage of the legislation, the Washington Scholarship Fund (WSF), a 501(c)3 
organization in the District of Columbia, was selected in late March 2004 by the U.S. Department of 
Education (ED) to implement the OSP, under the supervision of both ED’s Office of Innovation and 
Improvement and the Office of the Mayor of the District of Columbia. Since then, the WSE has worked 
with its implementation partners^ to finalize the Program design, establish protocols, recruit applicants 
and schools, award scholarships, and place and monitor scholarship awardees in participating private 
schools. The funds appropriated for the OSP are sufficient to support approximately 1,700 to 1,800 
students in a given year, depending on the cost of the participating private schools that they attend. 

To date, there have been three rounds of applicants to the OSP: 

• Applicants in spring 2004 (cohort 1), 

• Applicants in spring 2005 (cohort 2), and 

• Applicants in spring 2006 (cohort 3). 

This report, and the mandated evaluation (see below), focuses on a subset of applicants in 
spring 2004 and in spring 2005 (cohorts 1 and 2). In these 2 years, there were a total of 5,818 applicants, 
of which 4,047 were deemed eligible to participate in the Program. Scholarships were offered to 2,454 of 
these eligible applicants, and 1,824 students used their scholarship in the first year of scholarship receipt 
(table 1-1). Descriptive reports on each of these first 2 years of Program implementation have been 
previously prepared and released (Wolf et ah, 2005; Wolf et ah, 2006).^ A much smaller number of 
cohort 



^ The WSF has joined with Capital Partners for Education, DC Parents for School Choice, and Fight for Children — all District- 
based nonprofit organizations, to assist in client recruitment and implementation activities. 

^ Both of these reports are available on the Institute of Education Sciences’ Web site at: http://www. ies.ed.gov/ncee . 
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3 students were recruited and enrolled by WSF in spring 2006 in order to keep the Program operating at 
capacity/’^ As of fall of 2006, exactly 1,800 students were using Opportunity Scholarships. 

Table 1-1. OSP Applicants by Program Status, Cohorts 1 , 2, and 3 





Cohort 1 Cohort 2 

(Spring 2004) (Spring 2005) 


Total 

Cohort 1 and 
Cohort 2 


Cohort 3 
(Spring 2006) 


Total, All 
Cohorts 


Applicants 


2,692 


3,126 


5,818 


576 


6,394 


Eligible applicants 


1,848 


2,199 


4,047 


396 


4,443 


Scholarship awardees 


1,366 


1,088 


2,454 


396 


2,850 


Scholarship users in initial year of receipt 


1,027 


797 


1,824 


328 


2,152 


Scholarship users fall 2005 


919 


797 


1,716 


NA 


1,716 


Scholarship users fall 2006 


788 


684 


1,472 


328 


1,800 



NOTES: Because most participating private schools closed their enrollments by mid-spring, applicants generally had their 
eligibility determined based on income and residency, and the lotteries were held prior to the administration of baseline 
tests. Therefore, baseline testing was not a condition of eligibility for most applicants. The exception was applicants 
entering the highly oversubscribed grades 6-12 in cohort 2. Those who did not participate in baseline testing were 
deemed ineligible for the lottery and were not included in the eligible applicant figure presented above, though they 
were counted in the applicant total. In other words, the cohort 2 applicants in grades 6-12 had to satisfy income, 
residency, and baseline testing requirements before they were designated eligible applicants and entered into the 
lottery. 

The initial year of scholarship receipt is fall 2004 for cohort 1, fall 2005 for cohort 2, and fall 2006 for cohort 3. 

SOURCES: The DC Opportunity Scholarship Program applications and the Program operator’s files. 



1.2 The Mandated Evaluation 



In addition to establishing the OSP, Congress required an independent evaluation that uses 
“. . . the strongest possible research design for determining the effectiveness” of the Program.® The 
legislation indicated that the evaluation should include analyses of the effects of the Program on various 



* Because the influx of cohort 2 participants essentially filled the Program, the WSF recruited and enrolled a much smaller 
cohort 3 to replace OSP students who left the Program between the second and third year of implementation. WSF limited 
cohort 3 applications to students entering grades K-6 because there were few slots available in participating junior high and 
high schools, as large numbers of students from cohorts 1 and 2 advanced to those grades. Applications also were limited to 
students previously attending public schools or rising kindergartners, since public school students are a higher service priority 
of the Program than are otherwise eligible private school students. 

^ Combined with the 1,458 (85 percent) cohort 1 and 2 students who renewed for 2006-07, the OSP was supporting a total of 
1,786 students in participating private schools in fall of 2006. Of the 258 students from cohorts 1 and 2 who did not renew 
their scholarships for 2006-07, 7 graduated from high school, 57 moved out of the District of Columbia, 40 were in families 
that earned their way out of income eligibility and were not supported by private “bridge” funds before the passage of the 
income ceiling amendment in the recent Congress, and 154 simply chose not to re-enroll. A total of 68 students were in 
families that earned their way out of income eligibility but were supported by private bridge funds and therefore were able to 
continue their enrollments in their participating private schools. These figures were provided to the evaluation team by the 
WSF based on its administrative records. 

^ District of Columbia School Choice Incentive Act of 2003, Section 309 (a)(2)(A). 
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academic and non-academic outcomes of concern to policymakers.^ This legislative mandate led the 
evaluators to focus on the following research questions: 



1. What is the impact of the Program on student academic achievement? Does the award 
of a scholarship improve a student’s academic achievement in the core subjects of 
reading and mathematics? 

2. What is the impact of the Program on other student measures (e.g., school attendance 
and educational attainment)? Does the award of a scholarship improve other important 
aspects of a student’s education that are related to school success? 

3. What effect does the Program have on school safety and satisfaction? Does the award 
of a scholarship increase student and/or parent perceptions of safety with school? Does 
the award of a scholarship increase student and/or parent satisfaction with school? 

4. What is the effect of attending private versus public schools? Because some students 
offered scholarships will choose not to use them, and some members of the control 
group will attend private schools, the study will also examine the results associated 
with private school attendance with or without a scholarship. * 

5. To what extent is the Program influencing public schools and expanding choice options 
for parents in Washington, DC? That is, to what extent has the scholarship program 
had a broader effect on public and private schools in the City, such as instructional 
changes by public schools to respond to the new competition from private schools. 

ED’s Institute of Education Sciences (lES), responsible for the mandated evaluation, 
determined that the foundation of the evaluation would be a randomized controlled trial (RCT), 
comparing outcomes of eligible applicants (students and their parents) randomly assigned to receive or 
not receive a scholarship. This decision was based on the mandate to use rigorous evaluation methods, the 
expectation that there would be more applicants than funds and private school spaces available, and the 
Program requirement to use lotteries to determine who receives a scholarship when there is more demand 
for scholarships than can be accommodated. The law clearly specified that such a comparison in 



’ “The issues to be evaluated include the following: (A) A comparison of the academic achievement of participating eligible 
students... to the achievement of. ..the eligible students in the same grades... who sought to participate in the scholarship 
program but were not selected. (B) The success of the programs in expanding choice options for parents. (C) The reasons 
parents choose for their children to participate in the programs. (D) A comparison of retention rates, dropout rates, and (if 
appropriate) graduation and college admission rates... (E) The impact of the program on students, and public elementary 
schools and secondary schools, in the District of Columbia. (F) A comparison of the safety of the schools attended by students 
who participate in the programs and the schools attended by students who do not participate in the programs. (G) Such other 
issues as the Secretary considers appropriate for inclusion in the evaluation.” (Section 309 (4)). The statute also says that, “(A) 
the academic achievement of students participating in the program; (B) the graduation and college admission rates of students 
who participate in the program, where appropriate; and (C) parental satisfaction with the program” should be examined in the 
reports delivered to the Congress. (Section 310 (b)(1)). 

* Although the statute does not explicitly request analyses of the effects of private schooling, it does request comparisons 
between “program participants,” which could be understood to mean students using a scholarship to attend private school, and 
non-participants. 
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outcomes be made.® An RCT design is widely viewed as the best method for identifying the independent 
effect of programs on subsequent outcomes and has been used by researchers conducting impact 
evaluations of other scholarship programs in New York City; Dayton, Ohio; and Washington, DC. 



1.3 Contents of This Report 

This report is the third in a series of required annual evaluation reports to Congress. It 
presents the impacts of the Program on students and families 1 year after they applied and had the chance 
of being awarded and using a scholarship to attend a participating private school. 

In presenting these impacts, we first provide some background on the implementation of the 
OSP and the students and schools that are part of the Program, much of which has been described in prior 
evaluation reports (chapter 2). We then present the research and analysis methods used in the impact 
evaluation, including data collection, imputation, statistical weighting, and the models used to estimate 
Program impacts (chapter 3). The main impact results, both for the overall group and for important 
subgroups of applicants, are described in chapter 4; these findings address whether students who received 
a scholarship through the lotteries (and their parents) benefited from 1 year in the Program by comparing 
their early outcomes to the outcomes of students who applied for but did not receive scholarships through 
the lotteries. The final chapter (chapter 5) deviates somewhat from the random assignment design to 
assess the impact of the OSP on those students who actually used their scholarship to attend a private 
school, since not all scholarship awardees did, and to estimate differences in outcomes between those who 
attended private schools and those who did not, using the lottery results as an instrument to control for 
likely selectivity in such non-random participant samples. 

The findings here are based on information collected only a year after students applied to the 
Program and may not reflect the consistent impacts of the OSP over a longer period of time. Families that 
apply to voucher programs intend for their children to leave their current public schools, and, in the case 



See Section 309 (a)(4)(A)(ii). 

RCTs are commonly referred to as the “gold standard” for evaluating educational interventions; when mere chance determines 
which eligible applicants receive access to school choice, the students who apply but are not admitted make up an ideal 
“control group” for comparison with the school choice “treatment group.” Both groups of participants are equally motivated to 
obtain new educational options, and nothing except a random draw distinguishes those who receive the opportunity from those 
who do not. Therefore, any differences in the two groups in subsequent years can be attributed to the impact of the program. In 
contrast, the results of school choice studies that are not based on RCTs must be interpreted and used more cautiously because 
comparisons between the applicants and a group of students who chose not to apply will likely reflect not only the impact of 
the program but also initial differences between the groups in motivation and other unmeasured characteristics. See chapter 3 
for more detail on the RCT design and analysis. 
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of the OSP, a much higher share of students in the treatment group (91.3 percent) switched schools — 
mostly from public to private — compared to those in the control group (56.6 percent). The first-year 
results, therefore, provide an early look at student experiences in what was a transitional year for most of 
them. Future reports will examine impacts 2 and 3 years after application to the Program, when any short- 
term effect of students’ transition to new schools may have dissipated. The later reports will also consider 
additional outcome measures, assess the extent to which school characteristics are associated with 
impacts, and examine how the DC public school system is changing in response to the Program. 

In the end, the findings in this and subsequent reports are a reflection of the particular 
Program elements that evolved from the law passed by Congress and the characteristics of the students, 
families, and schools — both public and private — that exist in the Nation’s capital. The same program 
implemented in another city might yield different results, and a different scholarship program 
administered in Washington, DC, might also produce different outcomes. Thus, while the results 
presented here will contribute to the research evidence on scholarships in general, they are most relevant 
to the specific program that is being evaluated and described in the following chapters. 
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2. Early Implementation of the Program and the Sample 

for the Impact Analysis 



The recruitment, application, and lottery process conducted under the guidance of the 
Washington Scholarship Fund (WSF) created the foundation for the impact analysis that is the focus of 
this report. The schools recruited into the Opportunity Scholarship Program (OSP) determined the 
number of private school slots available and, ultimately, the quality of instruction to which Program 
participants were exposed. The students who applied, combined with the slots available, established the 
parameters for the lotteries and may well influence whether they benefit from the Program. This chapter 
provides additional detail regarding the OSP, including the design of the lotteries that enable the study to 
be experimental in design and execution, the characteristics of the students that are the Program and 
impact study participants, and the types of schools the students were enrolled in when they applied and 1 
year later. It is designed to communicate how and when the Program was implemented and the conditions 
under which the impact evaluation took place. 



2.1 Student Recruitment 

Very quickly after it received the grant to operate the OSP, WSF and its partners began to 
recruit families to participate in the Program. In addition to numerous mailings and visits to schools and 
churches, application events were held throughout the District of Columbia. The form necessary for 
applying to the Program required parents to confirm that student applicants met all eligibility 
criteria — residing in DC and entering kindergarten through grade 12 — and to provide documentation for 
verification purposes, including residency and income; it also functioned as the baseline or “pre -program” 
survey for the evaluation and included a parent consent form for the evaluation’s data collection. 

Over the first 2 years of recruitment, in spring 2004 and 2005, WSF received applications 
from 5,818 students. Of these, approximately 70 percent (4,047 of 5,818) were eligible to enter the 
Program. These eligible applicants represent about 10 percent of the population in Washington, DC, that 
met the Program’s eligibility criteria, according to 2000 Census figures (table 2-1). 
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Table 2-1. OSP Applicants by Program Status, Spring 2004 and Spring 2005 



Measure 


Spring 04 


Spring 05 


Total 


Low-income students in DC 


40,507 


40,507 


40,507 


Total applicants 


2,692 


3,126 


5,818 


Eligible applicants 


1,848 


2,199 


4,047 


Eligible applicants as percent of low-income students in DC 


5% 


5% 


10% 



NOTES: The total of low-income students in DC represents a constant approximation of the maximum number of 
eligible students who could conceivably apply to the Program in a given year. Total applicants and eligible 
applicants are unique students who have applied. Any student who applied in both years is counted only 
once. Applicants entering grades 6-12 in cohort 2 who did not participate in baseline testing were not 
included in the eligible applicant figure. 

SOURCES: Figures for low-income students are based on data from the U.S. Census, population of the District of 
Columbia ages 5 to 17 under 185 percent of the Federal poverty line in 2000. The exact numbers for 2004 
and 2005 are likely to differ somewhat from this 2000 figure. Numbers of applicants and eligible 
applicants are from the DC OSP applications. 



2.2 The OSP Lotteries and the Creation of the Impact Sampie 

Once students applied and were verified eligible for the Program, the next step was to 
determine whether they would receive a scholarship. As noted in chapter 1, the statute specified that 
lotteries be conducted to award scholarships when the Program is “oversubscribed,” that is, when the 
number of eligible applicants exceeds the number of available slots in participating private schools.’^ 
Further, the statute specified that certain groups of applicants be given priority in any such lotteries, 
which led to the following classifications: 

• Applicants attending a public school in need of improvement (SINI) under No Child 
Left Behind (NCLB) (highest priority); 

• Non-SINI public school applicants (middle priority); and 

• Applicants already attending private schools (lowest priority). 

However, not all students faced conditions for a lottery. In the first year of Program 
implementation (spring 2004 applicants, referred to as cohort 1), for example, there were more slots in 
participating schools than there were applicants for grades K-5;'* therefore, all eligible K-5 applicants 
from SINI and non-SINI public schools automatically received scholarships, and no lotteries were 



However, because the extent of oversubscription varied significantly by grade, in practice the determination of whether 
to hold a lottery was considered within grade bands: those applying for grades K-5, those applying for grades 6-8, and 
those applying for grades 9-12. 

** Throughout this report, applicants are always categorized by the grade they are forecasted to be entering for the next school 
year. Therefore, kindergartners (K) are actually pre-schoolers who are “rising kindergartners.” 





conducted at that level. In contrast, there were more eligihle public school applicants in cohort 2 (spring 
2005) than there were available slots at all grades levels, so that all of those applicants were subject to a 
lottery to determine scholarship awards. One other difference is that, because there were sufficient funds 
available in school year 2004-05, applicants seeking an OSP scholarship but who were already attending a 
private school were entered into a lottery the first year. In cohort 2, there was sufficient demand from 
public school applicants that lotteries were conducted only for them; applicants who were already 
attending a private school (the lowest priority group) were not entered into a lottery and did not receive 
scholarships. 



Lottery Design and Outcomes 

In general, the probability of being awarded a scholarship through a lottery was based on a 
given student’s priority status and the ratio of slots to applicants in that student’s grade band (grades K-5, 
grades 6-8, and grades 9-12). Within a given grade band, applicants from SINI-designated public schools 
were assigned award probabilities approximately one-third higher than those from non-SINI public 
schools. Eligible applicants from private schools were assigned much lower probabilities than either type 
of public school applicants in the first year, and an award probability of 0 in the second year, when the 
Program was oversubscribed with higher priority public school applicants. Across the grade bands, the 
award probabilities were determined by the degree of oversubscription for those grades. Given the 
likelihood that some students would choose not to use the scholarships that were awarded to them, 
based on previous scholarship program experiments, award probabilities were then adjusted to 
“over-award” scholarships by approximately 20 percent'^ (see Howell and Peterson et al., 2002, p. 44). 

In total, after the first 2 years of Program implementation, the WSF had awarded 
Opportunity Scholarships to 2,454 students. The total awards to the three priority subgroups were 

• 508 scholarship awards to public school students attending schools designated as SINI 
the year before they entered the lottery; 

• 1,730 scholarship awards to students in non-SINI public schools; and 

• 216 scholarship awards to students attending private schools but otherwise eligible for 
the Program (in the first year only). 



For example, the proportion of awarded scholarships that were actually used in the first year of the previous experiments in 
Washington, DC; Dayton, Ohio; and New York City ranged from 68 to 82 percent. 
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The timing of the release of annual SINI designations complicated the task of assigning SINI 
applicants priority in the lotteries. To enable students who were offered scholarships to obtain a 
placement in a participating private school, scholarships had to be awarded in April-June of each 
implementation year. However, the list of District of Columbia public schools designated as in need of 
improvement each year was not released until August. Therefore, priority designations for the lotteries 
had to be based on the shorter list of schools designated SINI during the year prior to the lottery. About 
40 percent of applicants were from schools designated SINI for the year the students would be leaving 
those schools to participate in the OSP. For cohort 1, 37 percent were SINI, and for cohort 2, just under 
43 percent were SINI. In total, 44 percent of OSP applicants were from schools designated as SINI 
between 2003 and 2005, a period when the number of SINI schools in the District jumped from 15 to 101 
(table 2-2). 

Table 2-2. Percent of Public School Applicants From SINI Schools, Spring 2004 and Spring 2005 



Timing of SINI Designation 


Total 


Spring 2004 


Spring 2005 


SINI in fall 2003 (N=15) 


4.9 


5.9 


4.2 


SINI in fall 2004 (A=90) 


36.9 


37.1 


36.7 


SINI in fall 2005 (A=101) 


43.7 


44.8 


42.8 


All eligible public school applicants 


3,159 


1,343 


1,816 



NOTE.' The figures in bold are those most relevant for each cohort. 

SOURCES.' The DC Opportunity Scholarship Program applications and the District of Columbia Public Schools Web site. 



Creation of the Impact Sample 

The impact sample is a direct result of the lotteries and the critical component of the 
legislatively required rigorous evaluation of the OSP. Impact evaluations compare the outcomes for a 
group of applicants or study participants, all of whom were randomly awarded to either receive access to 
the intervention (e.g., an OSP scholarship) or to not receive access. The lotteries conducted for some of 
the OSP applicants in years 1 and 2 satisfy these requirements. Since the intervention under consideration 
is an Opportunity Scholarship to attend a private school, the impact analysis focuses on the population of 
applicants for whom private schooling represented a new opportunity. Thus, the impact sample for this 
evaluation comprised all eligible applicants who were previously attending public schools (or were rising 
kindergartners) AND were subject to a lottery to determine whether they would receive an Opportunity 
Scholarship (figure 2-1, shaded area). 
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Figure 2-1. Construction of the Impact Sample From the Applicant Pool, Cohorts 1 and 2 




NOTES: C1 = Cohort 1 (applicants in spring 2004) 
C2 = Cohort 2 (appiicants in spring 2005) 
Totai = C1 and C2 



‘‘The group of applicants who were not randomly assigned includes: in cohort 1, public school applicants from SINI schools 
or who were entering grades K-5 (all received a scholarship), and in cohort 2, private school applicants, the lowest priority group 
(none received a scholarship because it was clear the Program would be filled with higher priority public school applicants). 



The total pool of eligible applicants comprised 1,848 applicants in cohort 1 (spring 2004) 
and 2,199 applicants in cohort 2 (spring 2005). Of those eligible applicants, 492 in cohort 1 and 1,816 in 
cohort 2 met the criteria to be randomly assigned by lottery to the treatment and control groups. In cohort 
1, a total of 299 students were randomized into the treatment condition and 193 into the control condition. 
In cohort 2, some 1,088 students were randomized into the treatment condition and 728 into the control 
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condition. The impact sample comprised by these groups totals 2,308 students:^^ 1,387 students in the 
treatment condition and 921 in the control condition. The more than 2,300 students in the impact sample 
is a large group relative to the impact samples used in other evaluations of private school scholarship 
programs (803 to 1,960 students) (Howell and Peterson et ah, 2002, p. 44). 



2.3 Characteristics of the Impact Sample 

The OSP impact sample group, 57 percent of eligible applicants, will provide the most 
reliable evidence regarding whether and how the OSP has influenced the educational experiences and 
outcomes of scholarship awardees.^’ The characteristics of this group are a factor in determining whether 
the random assignment was completed properly and possibly the extent of Program impacts. 



Overall Sample 

The students in the impact sample as a whole reflect the Program’s income eligibility criteria 
and priorities as specified in the authorizing legislation (table 2-3, second column): 



A total of five members of the cohort 1 control group were awarded scholarships by lottery in the summer of 2005 and a total 
of six members of the control group (cohorts 1 and 2) were awarded scholarships hy lottery in the summer of 2006 as part of 
the control group follow-up lottery to reward control group members who cooperate with the evaluation’s testing requirements. 
Control group students who win a follow-up incentive lottery are included in the analysis until they have been offered a 
scholarship, at which point they are excluded from subsequent data collection activities because their initial random 
assignment has been deliberately undermined. The exclusion of the 1 1 control group lottery winners did not affect the size of 
the control group sample for this report, as they all provided 1-year outcome data before being awarded their scholarships; 
however, the control group population for all future impact studies will not include those 1 1 students. 

The subgroups of eligible applicants to the Program who did not fit the criteria for the impact sample include eligible 
applicants in cohorts 1 and 2 who were already attending private schools (m= 888) and two groups of public school applicants in 
cohort 1 who were automatically awarded scholarships (n=851), specifically those from SINI public schools because of their 
high service priority and those applying from grades K-5 because there were sufficient private school slots in those grades to 
accommodate all of those applicants that year. The exclusion of these non-randomized subgroups from the impact evaluation 
has implications for the ability to generalize the study results. Since the experiences of students who sought and obtained a 
scholarship presumably to continue their private schooling will likely differ from public school applicants who are the subject 
of this experimental study, readers are cautioned not to generalize the results of this impact study to existing private school 
populations. In addition, the cohort 1 K-5 population that automatically received scholarships because their grade levels were 
not oversubscribed differ in important ways from the cohort 2 K-5 population that was randomized into the impact study. At 
baseline, the cohort 1 K-5 group was somewhat older, more likely to be from a SINI-ever school, African American, and have 
a special educational need, and had higher baseline test scores and family income than cohort 2 K-5 participants, who were 
more likely to be Hispanic (see appendix A). The cohort 1 applicants from SINI-designated schools numbered only 79. The 
655 cohort 2 applicants from SINI-designated schools provide a great deal of information about the impact of the Program on 
SINI participants. Despite the differences in the population of program participants inside and outside of the evaluation 
described above, the impact sample still retains characteristics of economic and social disadvantage that are common to urban 
families targeted for participation by school choice programs. 
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Table 2-3. Impact Sample Mean Characteristics at Baseline 



Characteristic 




Treatment 


Control 


Difference 


Pr > F 


Achievement (scale score): 

Cohort 1 reading 


661.04 


658.89 


663.61 


-4.73 


.18 


Percent missing (A=492) 


24.19 


21.40 


26.94 


-5.84 




Cohort 2 reading 


590.88 


587.34 


595.74 


-8.39 


.09 


Percent missing (A=l,816) 


27.81 


28.40 


26.92 


1.48 




Cohort 1 math 


662.36 


660.40 


664.46 


-4.05 


.28 


Percent missing (A=492) 


23.17 


20.40 


27.46 


-7.06 




Cohort 2 math 


566.58 


562.90 


571.63 


-8.73 


.10 


Percent missing (A=l,816) 


17.35 


16.36 


18.82 


-2.46 




Student demographics (percent): 

SINI ever 


43.91 


43.89 


43.93 


-0.04 


.98 


Percent missing 


0 


0 


0 


0 




Special needs 


12.25 


12.29 


12.21 


0.08 


.96 


Percent missing 


9.27 


9.66 


8.69 


0.97 




African American 


87.52 


86.81 


88.36 


-1.49 


.28 


Percent missing 


1.52 


1.51 


1.52 


0 




Hispanic 


9.41 


10.57 


8.05 


2.52 


.05* 


Percent missing 


1.52 


1.51 


1.52 


-0.01 




Female 


50.37 


49.29 


51.63 


-.19 


.28 


Percent missing 


.30 


.22 


.43 


3.77 




Family demographics (percent): 

Mother HS diploma 


79.65 


78.24 


81.23 


-2.99 


.11 


Percent missing 


15.08 


16.87 


12.38 


4.49 




Mother 4-yr degree 


5.95 


6.06 


5.83 


0.23 


.84 


Percent missing 


15.08 


16.87 


12.38 


4.49 




Mother full-time job 


57.63 


57.55 


57.74 


-0.19 


.93 


Percent missing 


15.94 


17.45 


13.68 


3.77 




Family demographics (mean): 

Family income 


$17,356.00 


$17,192.00 


$17,549.00 


$-357.10 


.43 


Percent missing 


0 


0 


0 


0 




Number of children 


2.91 


2.89 


2.94 


-0.05 


.43 


Percent missing 


0.43 


.43 


.43 


0 




Months of residential stability 


76.62 


76.20 


77.13 


-0.93 


.81 


Percent missing 


2.82 


2.74 


2.93 


-0.19 




Sample size (unweighted) 


2,308 


1,387 


921 







Statistically significant at the 95 percent confidence level. 

NOTES: These data are weighted. See chapter 3 for a discussion of the weighting process. 

SOURCES: The DC Opportunity Scholarship Program applications, the 2004 DCPS Accountability Testing Database, and 2005 
administration of the SAT-9 by evaluation staff. 
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• The average impact sample student at the time of application had a reading scale score 
of 608 and a math scale score of 588, which equate to the 33'^‘* National Percentile 
Rank (NPR) in reading and the 3R‘ NPR in math.^^ 

• About 44 percent of the impact sample was attending public schools designated SINI 
between 2003 and 2005. 

• Twelve percent were described by their parents as having special needs. 

• Nearly 90 percent were identified by their parents as African American, and 9 percent 
were identified as being of Hispanic ethnicity. 

• They are evenly divided between males and females. 

• Although 80 percent of their mothers reported having a high school diploma, only 6 
percent said they had a bachelor’s degree; 58 percent of the mothers reported working 
full time. 

• Their average household at the time of application had almost three children supported 
by an annual income of $17,356. 



Treatment vs. Control Groups 

An important strength of experimental methods of analysis is that the assignment of study 
participants to the treatment and control groups creates two analytic groups that are, on average, 
statistically similar at the time of random assignment. The treatment, in this case the offer of an 
Opportunity Scholarship, is provided to one group, and any subsequent differences in outcomes observed 
between the two groups can be ascribed to the impact of that treatment. 

To see how the random assignment process works, we compare the characteristics of the 
treatment and control groups as measured at baseline — prior to the Program intervention. The subgroup of 
students in the impact sample randomly assigned to the treatment group (table 2-3, treatment group 
column) is statistically similar to the randomized control group (table 2-3, control group column) in all 
but one instance. This pattern of characteristics across the treatment and control components of the 
overall impact sample is consistent with what we would expect from scholarship lotteries designed to 



The average NPRs for the impact sample were computed by taking the weighted average NPRs within the various grades using 
pre-imputation baseline test scores. 

Randomized groups are not necessarily identical; however, when they do occasionally differ significantly regarding a certain 
characteristic, the difference is due to chance and not because of the decisions or behaviors of study participants. 

The one exception was Hispanic ethnicity, as the treatment group has a higher proportion of members who self-identify as 
Hispanic than does the control group. Statistical theory predicts that 1 significant difference out of 16 in group characteristics 
is what would be expected by chance. 
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generate a representative treatment group of scholarship awardees and a statistically similar comparison 
group of control students. 



2.4 Schools Attended by OSP Applicants 



Nearly two-thirds of District private schools have agreed to participate in the OSP. 



• 58 (53 percent) of the 109 private elementary and secondary schools in DC in 2004 
agreed to participate in the Program in the first year of implementation. 

• 68 (65 percent) of the 104 District private schools in 2005, including all the schools 
that participated in the first year, chose to participate in the OSP during the second year 
of implementation. 

• Of the 68 participating schools in fall 2005, 60 (88 percent) had OSP students enrolled 
at that time. 

• Members of the impact sample were attending 47 (69 percent) of the 68 participating 
schools 1 year after being awarded their scholarships. 



Characteristics of All Participating Schools 



The private schools participating in the OSP represent the choice set presented to parents 
whose children received scholarships. As such, the features of these schools, whether or not any OSP 
students enrolled in them, are relevant to a description of the OSP as a school choice program. 



The religious status and affiliation of the participating schools varies (figure 2-2): of the 68 
private schools participating in the Program in 2005-06, 23 (34 percent) were schools of the Catholic 
Archdiocese of Washington, 15 (22 percent) were non-Catholic faith-based schools, 16 (24 percent) were 
members of the Association of Independent Schools of Greater Washington (AISGW), and 14 (21 
percent) were independent private schools that were neither faith-based nor members of the AISGW. 



Since factors such as attendance at a SINI-designated school and grade band affected each student’s probability of being 
awarded a scholarship, these calculations of subgroup averages are based on observations that have been weighted to eliminate 
any compositional differences induced by the differential treatment probabilities within priority and grade band strata. 
Additional detail regarding the weighting procedures is provided in the next chapter. 

Many members of the AISGW have religious names and traditions; however, they all operate independently of any direct 
influence by the authorities of a particular sectarian religion. Therefore, they have been classified as a particular category of 
school, neither fully faith-based nor fully non-faith based. These figures differ slightly from those originally presented in the 
first- and second-year descriptive reports on the Program, since those reports used a less differentiated classification scheme of 
Catholic, non-Catholic religious, secular, and unknown. 
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The composition of the set of private schools participating in the Program dropped from 64 percent faith- 
hased in 2004-05 to 56 percent faith-based in 2005-06. 

Figure 2-2. Number of Participating Schools by Religious Affiliation and by Year 
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Cathoiic based independent faith-based 

Schoois of 
Greater 
Washington 



Religious affiliation 



NOTES: Schools are defined as “participating” if they signed an agreement form to 
accept scholarship students. No schools dropped out of the Program between 
2004-05 and 2005-06. Five schools did not identify their religious affiliation in 
either year. 

SOURCES: “School Directory, D.C. K-12 Scholarship Program, 2004-05 School Year,” 
Washington Scholarship Fund, June 2004. “School Directory, D.C. 
Opportunity Scholarship Program, 2005-06 School Year,” Washington 
Scholarship Fund, August 2005. 



The 10 schools that joined the Program in year 2 are different in several other respects from 
the 58 schools that have participated from the start. On average, the new schools are more likely to charge 
tuition above the scholarship cap of $7,500, more likely to serve one or more high school grades, have a 
smaller percentage of racial minorities among their student populations, and are larger than the original 
group of participating schools (table 2-4). The average teacher/student ratios of the two groups of 
participating schools are statistically similar. 

The 36 District private schools that are not currently participating in the Program differ from 
the total set of participating schools in some respects. Non-participating schools are more likely to charge 
average tuitions above the scholarship cap, have smaller enrollments, serve a smaller minority population, 
and have lower student/teacher ratios than the average among the participating schools (table 2-4). The 



16 





group of non-participating private schools includes several highly specialized schools, such as a 
hallet school, as well as schools that exclusively serve students with significant disabilities.^^ 

Table 2-4. Features of DC Private Schools by OSP Participation Status, Years 1 and 2 



Item 


Participated 
Year 1 


New 

Participants 
Year 2 


Total 

Participants 
Year 2 


Non-participants 
Year 2 


Percent with average tuition^ above $7,500 


31.0'’ 


88.9** 


38.8 


73.7** 


Average size (student enrollment) 


204.0 


418.4** 


236.0 


137.6* 


Percent serving high schooF 


17.2 


50.0* 


22.1 


32.4 


Average percent minority 


81.2 


35.3** 


76.2 


56.6* 


Average student/teacher ratio 


10.9 


8.5 


10.6 


7.8* 


Total N 


58 


10 


68 


36 



*Statistically significant at the 95 percent confidence level. 

**Statistically significant at the 99 percent confidence level. 

■“For schools that charge a range of tuitions, the midpoint of the range was selected. Tuition rates were unavailable for 8 of the 
participating private schools and 26 of the non-participating private schools. 

'’Three schools charged no tuition either because of foundation support or because the school serves groups such as DC-placed 
special education students funded by the government. 

‘’Schools were classified as serving high schools if they enrolled students in any grade 9-12. 

Asterisks in column three denote characteristics of the set of newly participating schools (year 2) that are significantly different 
from those of the set of originally participating schools (year 1). Asterisks in column five denote characteristics of the set of non- 
participating schools that are significantly different from those of the set of all year 2 participating schools. 

SOURCES: Data on participating private schools drawn from “School Directory, D.C. K-12 Scholarship Program, 2004-05 
School Year,” and “School Directory 2005-06, D.C. Opportunity Scholarship Program,” Washington Scholarship 
Fund, June 2004 and April 2005, respectively. Data on both participating and non-participating private schools were 
also obtained from school Web sites. 



Characteristics of Participating Schools Attended by the Impact Sample 

Students in the impact sample were all attending DC public schools or were rising 
kindergartners in the year they applied for the OSP (table 2-5). As reported previously, 44 percent of the 
impact sample were attending schools designated SINI in 2003-05. After 1 year, 77 percent of the 
students awarded a scholarship were attending a participating private school. Fifteen percent of the 
students who were not awarded a scholarship were enrolled in any private school. Of these control group 
students attending private school, 73 percent of them were attending schools participating in the OSP.^* 

When such highly specialized private schools are excluded from the population, there are 88 “general service” private schools 
in the District of Columbia, of which 68 (77 percent) participated in the OSP in the second year of implementation. 

This figure is based on student-level data weighted to account for differential rates of response to the parent survey about the 
schools students were attending. For the unweighted data, the comparable figure is 75 percent of control group students 
attending private schools are in schools participating in the OSP. 
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That is, as has been true in other scholarship programs, not all treatment group students offered one chose 
to use their scholarship to attend a private school (at all or continuously), and some students in the control 
group found their way into private schools even without a Program scholarship. 

Table 2-5. Type of School Attended by the Impact Sample, Year of Application and 1 Year Later 







Baseline Year 




First Follow-Up Year 




Treatment 


Control 


Difference 


Treatment 


Control 


Difference 


School attending — 
Percent of students in: 

Private schools 


0.00 


0.00 


0.00 


77.12 


15.21 


61.91** 


Public schools 


100.00 


100.00 


0.00 


22.88 


84.79 


-61.91** 


SINI-ever schools 


43.89 


43.93 


-0.04 


9.80 


36.41 


-26.61** 


SINI-never schools 


56.11 


56.07 


0.04 


13.07 


48.38 


-35.30** 


Percent missing 


0.00 


0.00 


0.00 


1.81 


0.74 


1.07 



**Statistically significant at the 99 percent confidence level. 



NOTE: Data are weighted. For a description of the weights, see chapter 3. 

SOURCES: The DC Opportunity Scholarship Program Applications, the Impact Evaluation Parent Survey (for school attended), 
and the Impact Evaluation Principal Survey. 



The members of the impact sample who used their OSP scholarship and also responded to 
data collection during year 1 were enrolled in 47 of the 68 participating schools 1 year after being 
awarded their OSP scholarships.^^ Since participating schools varied in how many slots they committed to 
the Program, OSP students tended to cluster in certain participating schools with characteristics that 
differed somewhat from the “typical” participating OSP school. In other words, the student-weighted 
average characteristics of schools attended by OSP students differed somewhat from the school-weighted 
average characteristics of the set of OSP schools (table 2-6). 



In order to link a specific OSP student to the characteristics of his/her school, the student had to be an OSP scholarship user in 
the impact sample who also responded to follow-up data collection — including the survey question about the name of the 
school he/she was attending. The year 1 impact sample comprised 52 percent of all scholarship users that year. Year 1 survey 
response rates among impact sample scholarship users were 94 percent for cohort 1 and 91 percent for cohort 2. Thus, the 
information presented here about the schools OSP students are attending represents a selective sample of all students using 
OSP scholarships. These student-weighted data are being presented to describe what the OSP scholarship users in the impact 
sample are experiencing in terms of the new schools they are attending. Because this sample of students is not fully 
representative of the OSP in general, readers should not draw conclusions from these student-weighted school characteristics 
to characteristics of the Program in general. 
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Table 2-6. Features of Participating Private Schools Attended by the Treatment Group 



Characteristic 


Weighted 

Mean 


Highest 


Lowest 


Valid A 


Percent of OSP students attending a 
school charging over $7,500 tuition 


8.4 






47 


Tuition 


$5,253 


$24,545 


$3,400 


47 


Enrollment 


111 


1,072 


5 


47 


Percentage of student body from 
racial/ethnic minority groups 


95% 


100% 


16% 


47% 


Average student/teacher ratio 
Student N 


11.8 

941 


29.6 


2.6 


46 



NOTES: “Valid V” refers to the number of schools for which information on a particular characteristic was available. When a 

tuition range was provided, the mid-point of the range was used. The weighted mean was generated by associating 
each student with the characteristics of the school he/she was attending, then computing the average of these student- 
level characteristics. A total of 23.7 percent of the data were missing for each of the characteristics. The private 
school that enrolled only five students is a Montessori school serving children in pre-K and kindergarten only. 

SOURCES: National Center for Education Statistics: Private School Universe Survey, 2003-2004, supplemented by OSP School 
Directory information, 2004-05, 2005-06, Washington Scholarship Eund. 



Only 8.4 percent of this group of OSP students was attending a school that charged tuition 
above the statutory cap of $7,500 in their first year in the Program, even though 39 percent of 
participating schools charged tuitions above the cap by fall of 2005. The average tuition charged to these 
treatment group students who used their scholarships was $5,253 but varied between $3,400 and 
$24,545. The average OSP student in this group attended a school with 177 students — somewhat 
smaller than the average of 236 students across the set of participating schools. These OSP students are 
concentrated in the participating private schools with higher minority enrollments but with student/teacher 
ratios that are approximately representative of the entire set of OSP schools. 

Most of the scholarship users in the impact sample who responded to year 1 data collection 
are attending Roman Catholic schools (figure 2-3). Nearly two-thirds of these OSP students are attending 
the one-third of the participating OSP schools operated by the Catholic Archdiocese of Washington. 
About 17 percent are attending non-Catholic faith-based schools, and 18 percent are enrolled in 
nonsectarian private schools. 
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The WSE reported that families were not required to pay for tuition out-of-pocket in almost all cases where the tuition charged 
by the school exceeded the $7,500 cap. 
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Figure 2-3. Religious Affiliation of Participating Private Schools 
Attended by the Treatment Group 




SOURCES: National Center for Education Statistics: Private School Universe Survey, 
2003-2004, supplemented by OSP School Directory information, 2004-05, 
2005-06, Washington Scholarship Fund. 



In interpreting the presenee or absence of Program impacts, it is important to understand the 
difference between the treatment and control groups in their educational environments and experiences. 
Examining the characteristics of the schools attended by students in the treatment and control groups 
suggests (table 2-7)^' 



• There were no significant differences between treatment and control students in the 
characteristics of the public schools they attended at the time of application. 

• One year later, a similar proportion of students in the treatment and control groups 
were attending schools that offered libraries, gyms, special programs for advanced 
learners, individual tutors, art programs, and after-school programs. 

• One year later, students in the treatment group were more likely than those in the 
control group to have a computer lab or music program available to them at school. 
The treatment group was less likely to have access at school to a cafeteria, nurse’s 
office, counselors, or special programs for non-English speakers or students with 
learning problems. 



This information is from principal reports of the availability of facilities and programs in their school. 
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Table 2-7. Characteristics of School Attended by the Impact Sample, Year of Application and 1 
Year Later 







Baseline Year 






First Follow-Up Year 




Percentage of Students 
Attending a School with: 


Treatment 


Control 


Difference 


Treatment Control Difference 


Facilities: 


Computer lah 


73.53 


72.90 


0.63 


95.51 


89.13 


6.38** 


Library 


80.12 


78.07 


2.05 


79.52 


77.33 


2.19 


Gym 


63.67 


66.15 


-2.48 


70.95 


67.38 


3.57 


Cafeteria 


87.39 


88.68 


-1.29 


74.15 


87.95 


-13.80** 


Nurse's office 


87.43 


88.51 


-1.08 


29.27 


84.53 


-55.26** 


Percent missing 


6.84 


7.74 


-0.89 


35.42 


42.38 


-6.96 


Programs: 


Special program for non- 


48.62 


44.15 


4.47 


18.60 


57.10 


-38.50** 


English speakers 


Special program for students 


64.35 


65.58 


-1.23 


51.14 


88.72 


-37.58** 


with learning problems 


Special program for advanced 


38.65 


35.43 


3.22 


42.50 


37.85 


4.65 


learners 


Counselors 


80.50 


80.08 


0.43 


75.39 


82.11 


-6.72** 


Individual tutors 


36.58 


39.10 


-2.51 


78.10 


77.89 


0.22 


Music program 


70.14 


70.60 


-0.47 


93.57 


74.82 


18.75** 


Art program 


69.18 


66.66 


2.52 


84.23 


81.45 


2.78 


After-school program 


79.98 


79.31 


0.67 


94.73 


93.31 


1.43 


Percent missing 


7.16 


7.89 


-0.73 


34.41 


42.21 


-7.80 


Sample size (unweighted) 


1,387 


921 


466 


1,387 


921 


466 



**Statistically significant at the 99 percent confidence level. 



NOTES: Data are weighted. For a description of the weights, see chapter 3. 

SOURCES: The DC Opportunity Scholarship Program applications, the Impact Evaluation Parent Survey (for school attended), 
and the Impact Evaluation Principal Survey. 
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3. Research Methodology 



The evaluation of the DC Opportunity Scholarship Program (OSP) is designed as a 
randomized control trial (RCT) or experiment. Experimental evaluations take advantage of a 
randomization process that divides a group of potential participants into two statistically similar 
groups — a treatment group that receives admission to the intervention or program and a control group that 
does not receive admission — with the control group’s subsequent experiences indicating what prohahly 
would have happened to the members of the treatment group in the absence of the intervention. Most 
analyses of experimental data use covariates measured at baseline in statistical models to improve the 
precision of the impact estimates. The results can then be interpreted in relatively straightforward ways as 
revealing the actual impact of the program on outcomes of policy interest. This chapter describes the 
central features of the evaluation’s research design, the sources and treatment of data (including why and 
how the data were adjusted to maintain sample balance), and how the data were analyzed in order to 
identify program impacts. 



3.1 The “Treatment” and the “Counterfactual” 

The primary purpose of this evaluation is to assess the impact of the DC Opportunity 
Scholarship Program. The impact is defined as the difference between outcomes observed for scholarship 
awardees and what would have been observed for these same students had they not been awarded a 
scholarship. Although it is impossible to observe the same individuals in these two different situations, if 
random assignment is well implemented, the students who were offered scholarships will not differ in any 
systematic or unmeasured way from the group of non-awardees, except for the fact that they were offered 
scholarships. More precisely, there may be some non-programmatic differences between the two groups, 
but the expected or average value of these differences is zero because they are the result of mere chance. 
Under this design, a simple comparison of outcomes for the two groups yields an unbiased estimate of the 
effect of the treatment condition, in this case an unbiased estimate of the impact of the OSP on various 
outcomes of interest. 

It is important, however, to keep in mind the precise definition of the treatment and what it is 
being compared to (referred to as the counterfactual) because it is the difference in outcomes under these 
two conditions that leads to the estimated impact of the Program. 
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• The treatment is the award or offer of an Opportunity Scholarship, which is all the 
Program can do. The Program does not compel students to actually use the scholarship 
or make them move from a public to a private school. Therefore, the Program’s 
estimated average impact includes the reality that some students who are offered a 
scholarship will, in fact, he disinclined to use it (what we refer to as “decliners”). 

• In the same way, the counterfactual or control group condition is defined as applying 
for, hut not being awarded, an Opportunity Scholarship. Students randomized into this 
group are not prevented from moving to a private school on their own, if the family 
opts to use its own resources or if the student is able to obtain another type of 
scholarship from an entity other than WSF. Such independent access to a private 
school education, or to a non- Opportunity Scholarship, is not a violation of random 
assignment but a correct reflection of what probably would have happened in the 
absence of the new Program, i.e., that some students in the applicant pool would have 
found a way to attend a private school on their own. 

While these two study conditions and their comparison represent the main impact analysis 
approach, often called the “intent-to-treat” (ITT) analysis, the evaluation also provides separate estimates 
of the impact of the OSP on that subset of children who actually used the scholarship, referred to as 
estimated “impacts-on-the-treated (lOT).” In addition, the evaluation estimates the effects of actually 
attending a private school, regardless of whether an OSP scholarship is used. 



3.2 Study Power 

To ensure that the experimental evaluation of program impact will produce reliable findings, 
the sample size must be large enough to enable the analysis to answer the study’s central questions and to 
measure program effects that are large enough to be both meaningful in students’ lives and relevant to 
policy debates about the efficacy of educational interventions. The ability of a study to do so is a function of 
the study’s precision or “power.” 

Minimum detectable effects (MDEs) are a simple way to express the statistical precision of 
an impact study design. Intuitively, a minimum detectable effect is the smallest program impact or “effect 
size” that could be measured with confidence given random sampling and statistical estimation error. 



We define a minimum detectable effect as the smallest true program impact that would have an 80 percent chance of being 
detected (have 80 percent power) using a two-tail hypothesis test at the 0.05 level of statistical significance. We use a two-tail 
test because it is conceivable that the scholarship program could have either a negative or positive effect on test scores, even 
though the policy question is about improved test scores. 
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Before the collection and analysis of outcome data, study MDEs were estimated based on 
assumptions about response rates and the statistical relationships among the data (appendix B). These 
initial estimates signaled clearly that the first cohort of 492 randomized participants, alone, would 
generate insufficient study power. It was determined that combining the outcome data from cohort 1 with 
those of cohort 2 would greatly enhance the power of the evaluation, so that achievement impacts of even 
quite modest size in the full study sample (e.g., 0.11 standard deviation), and impacts of moderate size 
within policy relevant subgroups (e.g., 0.13-0.19 standard deviations), would be detectable if the Program 
were actually producing them. 



3.3 Sources of Data, Outcome Measures, and Baseline Covariates 

A variety of data are necessary to address the research questions specified in the authorizing 
legislation, and these data are being used in different ways in the analysis. Because the two cohorts are 
separated by a year, the annual impacts will represent data collected in different calendar years 
(table 3-1). 

Table 3-1. Alignment of Cohort Data with Impact Years 



Annual Impact 


Cohort 1 

(Spring 2004 applicants) 


Cohort 2 

(Spring 2005 applicants) 




Spring 2004 (baseline) 


Spring 2005 (baseline) 


Year 1 impact 


Spring 2005 (1*‘ follow-up) 


Spring 2006 (D‘ follow-up 


Year 2 impact 


Spring 2006 (2"^ follow-up) 


Spring 2007 (2"‘* follow-up) 


Year 3 impact 


Spring 2007 (3'^ follow-up) 


Spring 2008 (3'^'^ follow-up) 



Sources of Data 

Comparable data are being collected for each student in the impact sample regardless of 
whether the student is in cohort 1 or 2 or was randomly assigned to treatment or control. Data collection 
includes the following: 



To place these estimated effect sizes in context, an effect of 0.13 to 0.15 of a standard deviation in math equates to a National 
Percentile Rank (NPR) difference of 3.40 to 3.92 NPR points. For example, because the control group was, on average, at the 
35* percentile in math at baseline, a gain of 3.92 NPRs would bring its average performance up to about the 39* percentile. 
Such a gain is likely to be considered modest but educationally meaningful. 
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• Student assessments. Baseline measures of student achievement in reading and math 
for public school applicants came from the SAT-9 standardized assessment 
administered hy the DCPS as part of its spring testing program for cohort 1 and from 
the SAT-9 standardized assessment administered hy the evaluation team in the spring 
for cohort 2.^"^ Each spring after the baseline year, the evaluation team is administering 
the SAT-9 to all cohort 1 and 2 students who were offered a scholarship as well as all 
members of the control group who did not receive a scholarship. The testing takes 
place primarily on Saturdays, during the spring, in locations throughout the city 
arranged by the evaluators. The testing conditions are similar for members of the 
treatment and control groups, and the test administrators hired and trained by the 
evaluation team do not know whether specific students are members of the treatment or 
control groups. The standardized testing in reading and math provides the outcome 
measures for student achievement. 

• Parent surveys. The OSP application included baseline surveys for parents applying 
to the program. These surveys were appended to the OSP application form, and 
therefore were completed at the time of application to the Program. Each spring after 
the baseline year, surveys of parents of all applicants are being conducted at the 
Saturday testing events, while parents are waiting for their children to complete their 
outcome testing. The parent surveys provide the self-reported outcome measures for 
parental satisfaction and safety. Other topics include reasons for applying, school 
involvement, educational climate, and curricular offerings at the school. 

• Student surveys. Each spring after the baseline year, surveys of students in grades 4 
and above are being conducted at the outcome testing events. The student surveys 
provide the self-reported outcome measures for student satisfaction and safety. 
Additional topics include attitude toward school, school environment, friends and 
classmates, and individual activities. 



For cohort 1 at baseline, students in grades not tested by DCPS were contacted by the evaluation team and asked to attend 
Saturday testing events where the SAT-9 was administered to them. Fill-in baseline test scores were obtained for 70 percent of 
the targeted students. Combined with the scores received from DCPS, baseline test scores were obtained from 76 percent of the 
cohort 1 impact sample in reading and 77 percent in math. In the school year for which cohort 2 families applied for the OSP, 
the DCPS assessment program was in transition, and fewer grades were tested. As a result, the evaluation team attempted to 
administer the SAT-9 to all eligible applicants entering grades kindergarten through 12 at Saturday testing sessions in order to 
obtain a comprehensive and comparable set of baseline test scores for this group. Baseline test scores were obtained from 72 
percent of the cohort 2 impact sample in reading and 83 percent in math. Baseline test score response rates in reading were 79 
percent for the cohort 1 treatment group and 73 percent for the cohort 1 control group, a difference of 6 percentage points. In 
math, the cohort I treatment response rate at baseline was 80 percent — 7 percentage points above the control rate of 73 
percent. For cohort 2, baseline test score response rates were lower for the treatment group than for the control group in 
reading — 72 percent compared to 73 percent — but higher in math — 84 percent for the treatment group versus 81 percent for 
the control group. For the combined cohort impact sample, the baseline response rates in reading were 73 percent for both the 
treatment and control groups. In math, the combined cohort response rate was 83 percent for the treatment group and 79 
percent for the control group. Although the SAT-9 is not available for students below first grade, Stanford Achievement does 
offer similar tests that are vertically equated to the SAT-9 for younger students. We administered these tests — the SESAT 1 for 
rising kindergartners and the SESAT 2 for current kindergartners (i.e., rising first graders). 

The levels of response to the baseline parent surveys varied somewhat by item. All study participants provided complete 
baseline data regarding characteristics that were central to the determination of eligibility and priority in the lottery, such as 
family income and grade level. Response rates were very high (98-99 percent) for baseline survey items associated with the 
basic demographic characteristics of participating students, such as age, race, ethnicity, and number of siblings. Baseline 
survey response rates were lower (85-86 percent) for items concerned with the education and employment status of the child’s 
mother. The baseline survey response rates for the treatment and control groups did not differ systematically. 
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• Principal surveys. Each spring, surveys of principals of all public and private schools 
operating in the District of Columbia are being conducted. Topics include self-reports 
of school organization, safety, climate, principals’ awareness of and response to the 
OSP, and, for private school principals, why they are or are not participating in the 
OSP. Information from the principal surveys will be analyzed in future reports to 
describe what is happening within the public and private schools in DC, possibly as a 
result of the operation of the OSP. In addition, in later reports information from 
principals of impact sample members (treatment and control group) will be used to 
assess the relationship between school characteristics and impacts. 



Outcome Measures 



Congress specified in the Program statute that the rigorous evaluation study possible impacts 
regarding academic achievement, school safety, and satisfaction. For this first year impact report, impact 
estimates were produced for all three of these outcome domains: (1) academic achievement (two 
measures); (2) parent self-reports of school safety (one measure) and student self-reports of school safety 
(one measure); and (3) parental self-reports of satisfaction (three measures) and student self-reports of 
satisfaction (three measures). As in this report, previous studies of scholarship program impacts have used 
multiple measures of the outcomes of interest because achievement, safety, and satisfaction are constructs 
that often cannot be measured completely or well using any single indicator (see Mayer et ah, 2002; 
Witte, 2001). All outcome data were obtained from impact sample respondents in the spring of their first 
year after random assignment and include the following: 



• Academic outcomes. The academic outcomes used in these analyses are assessments 
of student academic achievement in reading/language arts and mathematics derived 
from the administration of the Stanford Achievement Test, 9th Edition (SAT-9) by 
Westat-trained staff. Eike most norm-referenced tests, the SAT-9 includes subtests 
within the reading and math domains in most grades; e.g., in grades 3-8, the reading 
test comprises reading vocabulary and reading comprehension, while the math test 
consists of math problem solving and math procedures. This norm-referenced test is 
designed to measure how a student’s performance compares with the scores of other 
students who took the test for norming purposes.^’ Each student’s performance is 
measured using scale-scores that are derived from item response theory (IRT) item- 
pattern scoring methods, which use all of the information contained in a student's 
pattern of item responses to compute an individual’s score. These scores have an 



The law requires the evaluation to use as its academic achievement measure the same assessment DCPS was using the first 
year the OSP was implemented, which was the SAT-9. 

The norming sample for the SAT-9 included students from the Northeastern, Midwestern, Southern, and Western regions of 
the United States and is also representative of the Nation in terms of ethnicity, urbanicity, socio-economic status, and students 
enrolled in private and Catholic schools. The norming sample is representative of the Nation, but not necessarily of DC or of 
low-income students. Scale scores are vertically integrated across grades, so that scores tend to be higher in the upper grades 
and lower in the lower grades. For example, the mean and standard deviation (SD) for the norming population is 463.8 
(SD=38.5) for kindergartners tested in the spring, compared to 652.1 (SD=39.1) for fifth graders and 703.6 (SD=36.5) for 
students in twelfth grade. 
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additional property called “vertically equating,” which allows scores to he compared 
across a grade span (e.g., K-12) to measure changes over time (see appendix C). 

• Parent self-reports of school safety. Parents were asked about the perceived 
seriousness of a number of problems at their child’s school commonly associated with 
danger and rule-breaking. The specific items, all drawn from the surveys used in 
previous experimental evaluations of scholarship programs, were: 

- Property destruction; 

- Tardiness; 

- Truancy; 

- Fighting; 

- Cheating; 

- Racial conflict; 

- Weapons; 

- Drug distribution; 

- Drug and alcohol use; and 

- Teacher absenteeism. 

Parents were asked to label these conditions as “very serious,” “somewhat serious,” or 
“not serious” at their child’s school. Responses to these items subsequently were 
categorized as “yes” (very or somewhat serious) or “no” (not serious). The number of 
“yes” responses for each parent were then summed to create a parental danger index or 
count that ranged from 0 to 10.^^ 

• Student self-reports of school safety. Students were asked how often (never, once or 
twice, three times or more) various adverse events had occurred to them this school 
year. The student danger indicators, drawn from previous scholarship program 
evaluations, included instances of: 

- Theft; 

- Being offered drugs; 

- Physical assault; 

- Threats of physical harm; 

- Observations of weapons being carried by other students; and 

- Bullying. 



Previous experimental evaluations of scholarship programs used summary scales to measure parental satisfaction, as we do 
helow, hut generally presented parental and student danger outcomes and student satisfaction outcomes for the individual items 
that we list here. We have created scales of satisfaction and indexes of danger concerns because the outcome patterns for the 
individual items tend to be generally consistent and, under such conditions, scaling them or combining them in indices tends to 
generate more reliable results (see appendix D). The impacts of the Program on each individual item in the various scales and 
indices are discussed in each section of chapter 4 that reports impacts on scaled outcomes. 
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Responses to these items were categorized as “yes” (at least once) or “no” (never) to 
create a count of the number of reported events that ranged from 0 to 6 (see Spector, 
1992).^“' 

• Parental self-reports of satisfaction. Parent satisfaction with their child’s school was 
measured three ways. First, parents were asked “What overall grade would you give 
this child’s current school?” Two outcomes were created from this question: (1) a 5- 
point grading scale ranging from 1 (an F) to 5 (an A) and a dichotomous variable equal 
to 1 if the parent assigned an A or B, and equal to zero otherwise. 

In addition, parents were asked “Flow satisfied are you with the following aspects of 
your child’s school?” and to rate each of the following dimensions on a 4-point scale 
ranging from “very dissatisfied” to “very satisfied:” 

- Location of school; 

- School safety; 

- Class sizes; 

- School facilities; 

- Respect between teachers and students; 

- How much teachers inform parents of students’ progress; 

- How much students can observe religious traditions; 

- Parental support for the school; 

- Discipline; 

- Academic quality; 

- Racial mix of students; and 

- Services for students with special needs. 

The responses to this set of items were combined into a single parent satisfaction scale 
using maximum likelihood IRT. IRT is a procedure which draws upon the complete 
pattern of responses to a set of questions in order to develop a reliable gauge of the 
respondent’s level of a “latent” or underlying trait, in this case satisfaction (Hambleton, 
Swaminathan, and Rogers, 1991). (See appendix D for a more detailed description of 
IRT.) In situations such as exist here, when individual questions each capture some 
piece of a more general construct (e.g., satisfaction) and the response categories capture 
the degree as well as the direction of the response, the IRT method is superior to count- 
based indices in measuring subjective conditions or traits. Two specific advantages of 
IRT scoring are that: (1) it allows scores to be assigned in the event that a respondent 
missed one of the scale items in his/her response, and (2) it identifies specific items that 
are highly effective in distinguishing respondents and assigns more weight to those 
items in the scale. For example, the IRT method is commonly used to score 
standardized tests. It will identify the questions that most clearly separate the better 
performing students from the worse performing students and count those items more 
heavily in generating the final test scores. 



As a count of discrete items, the student school danger index and the similar index from parent reports were not subject to 
internal consistency checks using Cronbach’s Alpha. The sum of item counts lacks multi-dimensional features of summated 
scale items, such as both direction and degree, that generate the data patterns necessary to produce consistency ratings. 
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The consistency and reliability of scaled measures of traits such as satisfaction can be 
determined by a rating statistic called Cronbach’s Alpha (Spector, 1992)."^° The 
completed parent satisfaction scale exhibited very high consistency with a Cronbach’s 
Alpha of 0.938. 

• Student self-reports of satisfaction. Students were also asked to grade their school 
using the same question asked of parents, and two outcomes were created — a grade 
range and a dichotomous variable — as discussed above for parents. Students were 
similarly asked to rate various specific aspects of their current school on a 4-point 
scale. The individual items covered the following topics (see appendix D): 

- Behavior and discipline; 

- Academic quality; 

- Social supports and interactions; and 

- Teacher quality. 

A single composite satisfaction scale was created for students using the same IRT 
procedures used to create the parent satisfaction scale. The student scale exhibited 
somewhat lower consistency with a Cronbach’s Alpha of 0.814, still well within the 
range of acceptable reliability. 



Baseline or “Preprogram” Covariates 

In addition to the collection of outcome data for each study participant, various personal, 
family, and educational characteristics of the students in the impact sample were obtained prior to random 
assignment via the application form (including a parent survey) and administration of the SAT-9 in 
reading and math (see appendix E)."^' Such “baseline” covariates are important in the context of an 
experimental evaluation, because they permit researchers to (1) verify the integrity of the random 
assignment (see chapter 2), (2) inform the generation of appropriate non-response weights, and (3) 
include the covariates in regressions to improve the precision of the estimations of treatment impacts and 
adjust for any baseline differences across the treatment and control groups (for a spirited exchange on this 
question, see Howell and Peterson, 2004; Krueger and Zhu, 2004a, 2004b, Peterson and Howell, 2004."^^ 
The covariates that are most useful in performing each of these three functions are those that previous 
research has linked to the study outcomes of interest (Howell and Peterson et ah, 2002, p. 212)."^^ These 



J. C. Nunnally is credited with developing the widely accepted standard that a Cronbach’s Alpha above .70 demonstrates an 
acceptable degree of internal consistency for a multi-item scale. 

Cohort 1 baseline test scores were obtained from the DCPS accountability testing database. Because DCPS dropped the SAT-9 
as its accountability test in 2005, baseline test scores for cohort 2 were obtained through SAT-9 administration by Westat. 

Analysts tend to agree that baseline covariates are useful in these ways within the context of an RCT, although some of them 
disagree regarding which of the three functions of preprogram covariates is most important. 

Previous analysts of voucher experiments have used a similar set of baseline covariates to estimate attendance at outcome data 
collection events and therefore inform student-level non-response weights. 
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variables regularly are included in regression models designed to estimate educational outcomes such as 
test scores, or, in the case of the SINI indicator, are especially important to this particular evaluation (see 
Krueger and Zhu, 2004a, p. 692):'*'^ 

• Student’ s baseline reading scale score, 

• Student’ s baseline math scale score, 

• Student attended a school designated SINI 2003-05 indicator, 

• Student’ s age (in months) at the time of application for an Opportunity Scholarship, 

• Student’ s forecasted entering grade for the next school year, 

• Student’ s gender - male indicator, 

• Student’s race - African American indicator, 

• Special needs indicator - whether the parent reported that the student has a disability, 

• Mother has a high school diploma indicator (GED not included), 

• Mother has a 4-year college degree indicator, 

• Mother employed either full or part time indicator, 

• Household income — reported total annual income, 

• Total number of children in student’s household, and 

• Stability — the number of months the family has lived at its current address. 



3.4 Sampling and Non-Response Weights 

Sampling weights were used in the impact analyses to account for the fact that the study 
sample was selected differently in the 2 years of OSP implementation as well as across different priority 
groups and grade bands (see section 2.2). Conducting the analyses without weights would run the risk of 
confusing the effect of the treatment with compositional differences between the treatment and control 
groups due to the fact that certain kinds of eligible applicants had higher or lower probabilities of being 



This list of baseline covariates is almost identical to the one that Krueger and Zhu used in one of their re-analyses of the data 
from the New York City voucher experiment. The only differences include alternate measures of the same characteristic (e.g., 
our measure of student disability includes English language learners whereas Krueger and Zhu included a separate indicator 
for English spoken at home) or variables that we were not able to measure at baseline (e.g., mother’s religion and mother’s 
place of birth). 
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awarded a scholarship. The sampling weights consist of two primary parts: (1) a “base weight,” which is 
simply the inverse of the prohahility of being selected to treatment (or control) and (2) an adjustment for 
differential non-response to data collection. (See appendix F for a more detailed explanation of the 
calculation of these weights.) 



3.5 Analytical Model for Estimating the Impact of the Program, or the Offer 
of a Scholarship (Experimental Estimates) 

To estimate the extent to which the Program has an effect on participants, this study first 
compares the outcomes of the two experimental groups created through random assignment, or the ITT 
approach referred to earlier in this chapter. The only completely randomized and therefore strictly 
comparable groups in the study are those students who were offered scholarships (the treatment group) 
and those who were not offered scholarships (the control group) based on the lottery. The random 
assignment of students into treatment and control groups should produce groups that are similar in key 
characteristics, both those we can observe and measure (e.g., family income, prior academic achievement) 
and those we cannot (e.g., motivation to succeed or benefit from the program). A comparison of these two 
groups is the most robust and reliable measure of Program impacts because it requires the fewest 
assumptions and least effort to make the groups similar except for their participation in the Program. 



Overall Program Impacts 

Because the RCT approach has the important feature of generating comparable treatment 
and control groups, we used a common set of analytic techniques, designed for use in social experiments, 
to estimate the Program’s impact on test scores and the other outcomes listed above. These analyses 
began with the estimate of simple mean differences using the following equation, illustrated using the test 
score of student i in year t (Y^ ): 



(1) Yit =a-i- X Tit -I- Sit if t>k (period after program takes effect), 

where Tit is equal to 1 if the student has the opportunity to participate in the scholarship Program (i.e., the 
award rather than the actual use of the scholarship) and is equal to 0 otherwise. Equation (1) therefore 
estimates the effect of the offer of a scholarship on student outcomes. Under this ITT model, all students 
who were randomly assigned by virtue of the lottery are included in the analysis, regardless of whether a 
member of the treatment group used the scholarship to attend a private school or for how long. 
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Proper randomization renders experimental groups approximately comparable, but not 
necessarily identical. In tbe current study, some modest differences, almost all of which are not 
significant, exist between the treatment group and the control group counterfactual at baseline (see 
Krueger and Zhu, 2004a; Peterson and Howell, 2004)."^^ The basic regression model can, therefore, be 
improved by adding controls for observable baseline characteristics to increase the reliability of the 
estimated impact by accounting for minor differences between the treatment and control groups at 
baseline and improving the precision of the overall model. 

This yields the following equation to be estimated: 



(2)Yit -a+ X Tit + Xi7+ §r Rit+ 8 m Mit+ 8^. 



where Xi is a vector of student and/or family characteristics measured at baseline and known to influence 
future academic achievement, and Rit and Mit refer to baseline reading and mathematics scores, 
respectively (each of the included covariates are described below). In this model, x — the parameter of sole 
interest — represents the effect of scholarships on test scores for students in the program, conditional on Xi 
and the baseline test scores. The 8’s reflect the degree to which test scores are, on average, correlated over 
time. With a properly designed RCT, baseline test scores and controls for observable characteristics that 
predict future achievement should improve the precision of the estimated impact. 



For example, although the average test scores of the cohort 1 and cohort 2 treatment and control groups in reading and math 
are all statistically comparable, in all four possible comparisons (cohort 1 reading, cohort 1 math, cohort 2 reading, cohort 2 
math) the control group average baseline score is higher. That is, on average the members of the control group began the 
experiment with slightly higher reading and math test scores than the members of the treatment group. The control group 
baseline test score advantage for cohort 1 reading, cohort 2 reading, cohort 1 mathematics, and cohort 2 mathematics was 4.7, 
8.4, 4.1, and 8.7 respectively, when the pre-imputation scores were used. The corresponding four differences were 4.1, 7.0, 3.7, 
and 1.6 when the post-imputation scores were used. Thus, after imputation the differences between treatment and control 
group baseline scores were attenuated. A joint f-test for the significance of the pattern of test score differences at baseline was 
not significant for the pre-imputation data but was significant using the post-imputation scores. This apparent anomaly is a 
result of the larger sample sizes after imputation, which reduces the standard errors across the board, thereby increasing the 
precision of the statistical test and the resulting likelihood of a statistically significant result. To deal with this difference in test 
scores across the treatment condition at baseline, we simply include the post-imputation baseline test scores in a statistical 
model that produces regression-adjusted treatment impact estimates. Controlling for baseline test scores in this way effectively 
transforms the focus of the analysis from one on achievement levels after 1 year, which could be biased by the higher average 
baseline test scores for the control group, to one on comparative achievement gains after 1 year from whatever baseline the 
individual student performed at to start the experiment. Because including baseline test scores in regression models both levels 
the playing field in this way and increases the precision of the estimate of treatment impact, it is a common practice in 
education evaluations generally and school scholarship experiments particularly. 
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Adjustment for Differences in Days of Exposure to School 



A final important covariate to include in this model is the number of days from September 1 
to the date of outcome testing for each student/® This “days until test” variable, signified by DT in the 
equation below, controls for the fact that test scores were obtained over a 4-month period each spring and 
that a student’s ability to perform on the standardized tests can be affected by the length of time he/she 
has been exposed to schooling. The “days until test” variable was further interacted with elementary 
school status (i.e., K-5), because younger students tend to gain relatively more than older students from 
additional days of schooling."^’ Thus, the models that produced the regression-adjusted impact estimates 
for this analysis took the general form/* 



(3)Yit -a-i- X Tit + XiY-i- 8 r Rit-i- 8 m Mit-i- 8DxDTit -t- Su. 



The same set of baseline covariates and the days-until-test variable were used in all 
regression models, regardless of whether student achievement, school satisfaction, or school safety 
outcomes were being estimated."^® 



Subgroup ITT Impacts 

In addition to estimating overall program impacts, this study was interested in the possibility 
of heterogeneous impacts (i.e., separate impacts on particular subgroups of students). Subgroup impacts 
were estimated by augmenting the basic analytic equation (3) to allow different treatment effects for 
different types of students, as follows: 



September T' was chosen as a common reference date because most private schools approximately follow the DCPS academic 
calendar, and September 1'’* fell within the first week of schooling in fall of both 2004 and 2005. 

The actual statistical results confirmed the validity of this assumption, as the effect of the days-until-test variable on outcome 
test scores was positive and statistically significant for K-5 students but indistinguishable from zero for 6-12 students. 

The possibility of a nonlinear relationship of days-until-test with the outcome variables was examined through the use of a 
categorized version of the days-until-test variable, with one category level including students with days-until-test below the 
median value, one level with days-until-test in the third quartile (median to 75* percentile), and one level with days-until-test 
in the fourth quartile (75* percentile to maximum). This allows for a quadratic relationship (down-up-down for example) in the 
regression estimation if such a relationship exists. The regression with the nonlinear days-until-test component did not provide 
a better fit to the data than the regression modeling a simple linear slope. As a result, the simpler model was used. 

After the initial impacts were obtained, a second set of estimates were run to test the sensitivity of the results to the set of 
covariates included in the model. This sensitivity model used only cohort, grade, special needs, number of children in the 
household, African American race, baseline reading, baseline math, and days until test as control variables, as these variables 
tended to be significant predictors of test score outcomes in the first set of models. No important differences were found. 
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(4) Yikt - |x+ xTikt + Xb Pi*Tikt +XV 2 (p^is + Xik y+ 8 r Rit+ 8 m Mit+ 8DxDTit + Sik,t 

where P is an index for whether a student is a memher of a particular subgroup (the P must he part of the 
X’s). The coefficient Xp indicates the marginal treatment effect for students in the designated subgroup. 
These models were used to estimate impacts on the separate components of the subgroup (e.g., impacts on 
males and females separately), and the difference in impacts between the two groups. These analyses of 
possible heterogeneous impacts across subgroups are conducted within the context of the experimental 
ITT design. Thus, as with the estimation of general program-wide impacts, any subgroup-specific impacts 
identified through this approach are understood to have been caused by the treatment. The ability to 
reliably identify separate impacts, however, depends on the sample sizes within each subgroup. 
Consequently, subgroup impacts were estimated for the following groups: 

• Applied from a school ever designated SINI — yes and no; 

• Academically lower performing student at the time of baseline testing (i.e., bottom 
one-third of the test score distribution) and higher performing (top two-thirds);^*' 

• Gender — male and female; 

• Grade band — K-8 and high school; and 

• Cohort — 1 and 2. 



Computation of Standard Errors 

In computing standard errors it is necessary to factor in the stratified sample design, 
clustering of student outcomes within individual families, and non-response adjustments. As a 
consequence, all of the impact analyses were completed using sampling weights in STATA.^' The effects 
of family clustering, which is not part of the sample design, but which may be having a measurable effect 



The lower third of the baseline performance distribution was chosen because preliminary power analyses suggested it would be 
the most disadvantaged performance subgroup that would include a sufficient number of members to reveal a distinctive 
subgroup impact if one existed. 

There is also a positive effect on variance (a reduction in standard errors) from the stratification. This effect will not be 
captured in the primary analyses, making the resultant variance estimators conservative. We will compute variances including 
the stratification directly via the use of jackknife replicate weights and also using Taylor-series linearization via STATA, but 
this will be a secondary analysis designed to confirm that the main analysis variances are not excessively conservative. 
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on variance, were taken into account using robust regression calculations (i.e., “sandwich” variance 
estimates) (see Liang and Zeger, 1986; White, 1982).^^ 



3.6 Analytical Model for Estimating the Impact of Using a Scholarship and 
of Attending a Private School 

Although the ITT analysis described above is the most reliable estimate of Program impacts, 
it cannot answer the full set of questions that policymakers have about the effects of the Program. Two 
different techniques, which deviate to different extents from the random assignment design, are necessary 
to estimate the impact on students and families from using a scholarship or from attending a private 
school (see appendix G for a more detailed discussion of the analytic methods, including the equations 
used in the models). 



Impact of Using a Scholarship 

For the scholarship awardees in the OSP impact sample that provided year 1 outcome test 
scores and the name of their school, 80 percent were attending a private school. The 20 percent of the 
treatment students who did not use their scholarships are treated the same as scholarship users for 
purposes of determining the effect of the offer of a scholarship, so as to preserve the integrity of the 
random assignment, even though scholarship decliners likely experienced no impact from the Program. 
Fortunately, there is a way to estimate the impact of the OSP on the average participant who actually used 
a scholarship, or what we refer to as the “impact-on-the-treated” (lOT) estimate. This approach does not 
require information about why 20 percent of the individuals declined to use the scholarship when 
awarded, or how they differ from other families and children in the sample. But if one can assume that 



We also examined the effect on the standard errors of the estimates of clustering on the school students attended at baseline. 
Baseline school clustering reduced the standard errors of the various impact estimates by an average of 2 percent, compared to 
an average reduction of less than 1 percent due to clustering by family. These results indicate that the student outcome data are 
almost totally independent of the most likely sources of outcome clustering. They may appear to be counter-intuitive, since 
formally accounting for clustering among observations usually increases variance in effects; however, since the randomization 
cut across families and baseline schools, it is possible that family and school clusters served as the equivalent of random- 
assignment blocks, as most multi-student families and schools contained some treatments and some controls. Such 
circumstances normally operate to reduce variance in subsequent impact estimates, as the within-cluster positive correlation 
comes into the calculation of the variance of the treatment-control difference with a minus sign. This last point is a technical 
matter that we plan to explore in greater depth in the future. 

A total of 17 treatment students who took the follow-up test in math (1.6 percent) and 7 treatment students who outcome tested 
in reading did not identify their current school. Since these observations represent less than 2 percent of the impact sample, the 
evaluators simply excluded them from the portion of the analysis focused on the impact of treatment on the treated and the 
effect of private schooling. 
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decliners experience zero impact from the scholarship Program — which seems reasonable given that they 
did not use the scholarship, it is possible to avoid these kinds of assumptions about (or analyses of) 
selection into and out of the Program. 

This is possible by using the original comparison of all treatment group members to all 
control group members (i.e., the ITT estimates described above) but re-scaling it to account for the fact 
that a known fraction of the treatment group members did not actually avail themselves of the treatment 
and therefore experienced zero impact from the treatment. The average treatment impact that was 
generated from a mix of treatment users and nonusers is attributed only to the treatment users, by dividing 
the average treatment impact by the proportion of the treatment group who used their scholarships. For 
this report, depending on the specific outcome being rescaled, this “Bloom adjustment” (Bloom, 1984) 
will increase the size of the ITT impacts by 25-35 percent, since the percentage of treatment users among 
the population of students that provided valid scores on the various test and survey outcomes ranged from 
74-80 percent. 

In the current evaluation, conventional Bloom adjustment may not be sufficient to accurately 
estimate the impact of using the OSP scholarship. It is conceivable that the design of the OSP Program 
and lotteries made it possible for some control group members to attend participating private schools, 
above and beyond the rate at which low-income students would have done so in the absence of the 
Program. Statistical techniques that take this “program-enabled crossover” into account are necessary for 
testing the sensitivity of the evaluation’s impact estimates. 

In a social experiment, even as some students randomized into the treatment group will 
decline to use the treatment, some students randomized into the control group will obtain the treatment 
outside of the experiment. For example, in medical trials, this control group “crossover” to the treatment 
can occur when the participants in the control group purchase the equivalent of the experimental 
“treatment” drug over the counter and use it as members of the treatment group would. The fact that 
crossovers have obtained the treatment does not change their status as members of the control group — ^just 
as treatment decliners forever remain treatments — for two reasons: (1) changing control crossovers to 
treatments would undermine the initial random assignment, and (2) control crossover typically represents 
what would have happened absent the experimental program and therefore is an authentic part of the 
counterfactual that the control group produces for comparison. If not for the medical trial, the control 
crossovers would have obtained the similar drug over the counter anyway. Therefore, any effect that the 
crossover to treatment has on members of the control group is factored into the ITT and Bloom-adjusted 
lOT estimates of impact as legitimate elements of the counterfactual. 
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In the case of the OSP experiment, control crossover takes place in the form of students in 
the control group attending private school. Among the members of the control group who provided 
outcome tests in math, 15 percent reported attending a private school. This crossover rate is in the higher 
end of the range reported for previous experimental evaluations of privately funded scholarship programs 
(Howell and Peterson et al., 2002, p. 44). The crossover rate also is higher for control group students 
with siblings in the treatment group (18 percent) compared to those without treatment siblings (11 
percent), a difference that is statistically significant beyond the 99 percent confidence level. At outcome 
data collection events, some parents of control group students commented to evaluation staff that their 
control-group child was accepted into a participating private school free-of-charge because he or she had 
a treatment group sibling who was using a scholarship to attend that school, and private schools were 
inclined to serve a whole family. Thus, apparently some of the control crossover that is occurring in the 
OSP could be properly characterized as “Program-enabled” and not a legitimate aspect of the 
counterfactual. 

The data suggest that 4 percent of the control group were likely able to enroll in a private 
school because of the existence of the OSP. This hypothesis is derived from the fact that 1 1 percent of the 
control group students without treatment siblings are attending private schools, whereas 15 percent of the 
control group overall is in private schools. Since the 11 percent rate for controls without treatment 
siblings could not have been influenced by “Program-enabled crossover,” we subtract that “natural 
crossover rate” from the overall rate of 15 percent to arrive at the hypothesized Program-enabled 
crossover rate of 4 percent. To adjust for the fact that this small component of the control group may have 
actually received the private- schooling treatment by way of the Program, the estimates of the impact of 
scholarship use will include a “double-Bloom” adjustment. We will rescale the pure ITT impacts that are 
statistically significant by an amount equal to the treatment decliner rate (~20 percent), as described 
above and, in addition, rescale in the same manner for the possible Program-enabled crossover rate (~4 
percent). This strategy will provide upper and lower bounds for the lOT estimates. 



First-year control group crossover rates in the previous three-city experiment were 18 percent in Dayton, Ohio; 11 percent in 
Washington, DC; and just 4 percent in New York City. Among those three cities, the average tuition charged by private 
schools is lowest in Dayton and highest in New York, a fact that presumably explains much of the variation in crossover rates. 

Because program oversubscription rates varied significantly by grade, random assignment took place at the student and not the 
family level. As a result, nearly half the members of the control group have siblings who were awarded scholarships. 
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Effect of Attending a Private School 



A third analysis that is made possible within the context of an experimental study is an 
estimation of the effect of experiencing the treatment, whether as a member of the treatment or control 
group. Such an analysis is conceptually distinct from estimating the lOT by way of the Bloom or “double- 
Bloom” adjustments described above, since it examines outcome patterns in both treatment and control 
groups that could be results of exposure to the treatment, either inside or outside the experiment. 

Such an analysis is inherently non-experimental, since, short of mandating the use of the 
scholarship by the treatment group and outlawing private schooling for the control group, there is no way 
to perfectly randomize scholarship use. So long as parents and students have the option of declining 
scholarships or obtaining private schooling outside of the experimental program, estimates of private 
schooling effects within the context of experiments are biased by the selective nature of scholarship users 
relative to all scholarship winners and control group “crossovers” relative to controls that remain in public 
schools. 



In this setting, instrumental variable (IV) analysis provides a well-established method to 
generate the best estimate of the impact of private schooling, drawing upon the ITT estimator (Howell 
and Peterson et al., 2002, pp. 49-51). Two stages of regression equations are run in order to arrive at an 
estimate of the effects of private schooling on each outcome, attempting to account for selection bias. In 
the first stage, the results of the treatment lottery and student characteristics at baseline are used to 
estimate the likelihood that individual students attended a private school in year 1. In the second stage, 
that estimate of the likelihood of private schooling operates in place of an actual private schooling 
indicator to estimate the effect of private schooling on outcomes. In cases like this experiment, the IV 
procedure will generate estimates of the effect of private schooling that will be slightly larger than the 
double-Bloom lOT impact estimates. Since the IV process places greater demands upon the data, special 
attention must be paid to the significance levels of IV estimates, as some experimental impacts that are 
statistically significant at the ITT stage lose their significance when subjected to IV analysis. 
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4. Impact of Being Awarded a Scholarship, One Year 

After Application 



The statute that authorized the District of Columbia Opportunity Scholarship Program (OSP) 
mandated that the Program he evaluated with regard to its impact on student test scores and safety, as well 
as the “success” of the Program, which, in the design of this study, includes satisfaction with school 
choices. This chapter presents the effects of being awarded a scholarship on these outcomes 1 year after 
families and students applied to the OSP, or approximately 7 months after the start of their first possible 
school year in the Program. After providing some context for understanding the presentation of results, 
most of the chapter describes findings on the impact of the offer of a scholarship, including a discussion 
of subgroup impacts and checks for the sensitivity of the results. The results tables in this chapter convey 
information about the treatment and control group means and any difference between them (i.e., the 
programmatic impact) that is drawn from the regression equations described above in chapter 3. 
Appendix H contains a parallel set of results tables that include the raw (unadjusted) group means as well 
as additional statistical detail regarding the impact estimates. 



4.1 Interpreting the Impacts 

The impact results in the following sections are presented in a variety of ways and include a 
comprehensive approach to assessing the validity of the findings. First, there are tables that provide the 
following information for each outcome measure whenever space permits: 

• Treatment group mean; 

• Control group mean; 

• Estimated difference in means, which is the treatment impact; 

• Effect size in standard deviation units and 

• p-value. 



Specifically, the effect sizes are computed as a percentage of a standard deviation for the control group after 1 year. Since the 
outcomes of the experimental control group signal what would have happened to the treatment group in the absence of the 
intervention, a standard deviation in the distrihution of the control group outcomes represents an especially appropriate gauge 
of the magnitude of any treatment impacts observed. 
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The means are adjusted for minor differences between the treatment and control groups at 
baseline. The control group mean for a given outcome is predicted from a regression equation that 
includes an indicator variable for the treatment impact as well as the full set of baseline covariates 
described in chapter 3. The treatment group mean is then generated by adding the treatment impact from 
the regression equation to the predicted control group mean. Conceptually, the treatment group mean 
expressed in this way describes what the control group mean would have looked like had the control 
group been administered the programmatic treatment. Effect sizes translate the impact into a standard 
metric and provide information about whether the size of the impact might be considered meaningful. 
Programmatic impacts might be statistically significant in that they are reliable, but might be either 
sizable or trivial in magnitude. Given the power levels for this evaluation, however, any impacts that are 
statistically significant are likely to be at least “moderate” in size for an educational intervention. The p- 
value gives a sense of the extent to which we can be certain that an estimated impact of the Program is 
reliable and not a chance anomaly. The smaller the p-value, the more confidence we can have that an 
observed impact is due to the treatment and not merely due to chance. 

Second, these measures are also converted into a series of figures that visually demonstrate 
the impact. In particular, brackets that represent the range of values that lay within a 95 percent 
confidence interval are placed around the impact estimate. If that range of plausible values includes the 
value of zero, then we cannot reject the hypothesis that the Program had no impact on that outcome. 

Third, the analyses of each type of outcome (achievement, safety, satisfaction) used different 
specific measures and were estimated for the overall impact sample as well as various policy -relevant 
subgroups of students. Under such conditions of “multiple comparisons,” there is a modest probability 
that a finding of a statistically significant difference will emerge by random chance — an event that is 
known in the statistical field as a “false discovery.” False discoveries are most common when p-values 
are very close to the prescribed cut-off level for statistical significance, in this case p < .05, and when 
many comparisons are made, thereby giving random chance extra opportunities to produce an apparently 
(but not actually) statistically significant finding. 



This approach is somewhat complicated in the case of calculating the regression adjusted means — also called the predicted 
marginals — for analysis of subgroups. The predicted marginal is defined as the average predicted response if all the students in 
the entire study sample had been in a given subgroup (e.g., males). The predicted values are derived from the fitted model 
(either ordinary least squares (OLS) for continuous outcomes or logistic for binary outcomes), where each individual's values 
on the covariates are used with the exception of the subgroup of interest, which is set to 1 and to 0 for each observation. In 
other words, the predicted marginal calculation sets every observation in the sample as fixed (i.e., assigned the value 
corresponding to male), and all other variables are what was observed for each student. 
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To guard against the drawing of firm conclusions generated by false discoveries, Benjamini 
and Hochberg (1995) developed a statistical test designed to screen out marginally significant findings 
from multiple comparisons, since such results could be the product of random forces. Following those 
procedures, the evaluation team statistically adjusted the p-values for results that were the product of 
multiple comparisons to account for how many comparisons were in the set that produced the particular 
finding. As a result of the Benjamini-Hochberg test, a few findings that initially appeared to be barely 
statistically significant, but were part of a set of subgroup or item-specific findings generally NOT found 
to be statistically significant, were downgraded to not statistically significant in the interest of guarding 
against false discoveries. Individual findings that were highly statistically significant or were a part of a 
group of findings that all were statistically significant were not affected in any substantive way by this 
adjustment. 



Finally, in any evaluation, decisions are made about how to handle certain data or analysis 
issues (e.g., missing data, sampling weights, etc.). While there are some commonly accepted approaches 
in research and evaluation methodology, sometimes there are multiple approaches, and any could be 
acceptable. The evaluation team chose its approach in consultation with a panel of methodology experts 
before analyzing the data and seeing the results. However, in an effort to be both transparent and 
complete, each presentation of analysis is followed by a discussion of the sensitivity testing conducted to 
determine how robust the estimates are to alternative specifications or analytic approaches. These 
alternative specifications include: 



• Trimmed sample: The sample of students was trimmed back to equalize the actual 
response rates of the treatment and control groups. Since the actual response rate of the 
treatment group was higher (79 percent), in effect the “latest treatment group members 
to respond” were dropped from the sample until the treatment response rate matched 
the control group’s pre-subsample response rate of 68 percent. This is an alternative to 
the primary analysis, where all observations were used even though a higher 
percentage of the treatment than the control group responded to outcome data 
collection. This sensitivity testing is designed to address whether the difference in 
response rates is adequately controlled for by non-response weighting. 

• Reduced set of covariates: While the main impact results included the set of covariates 
described in chapter 3, it was also possible to assume the random assignment equalized 
the characteristics of the treatment and control groups so that only a more limited set of 
covariates was needed. In the alternative specification, the covariates used in the 
regression-adjusted impacts were limited to cohort status, grade band, special-needs 
status, number of children in the household, race, days to test, and baseline test scores. 

• Simpler approach to missing baseline data: As many researchers do, the evaluation 
team imputed missing baseline data using a well-accepted approach (see appendix C). 
However, we tested whether the results hold up if the analysis was run with dummy 



43 




variables for missing data instead of using imputed data, an approach used in some 
studies. 



The results of the lOT analysis appear later, in chapter 5, and include Bloom and IV 
regression-adjusted estimates of programmatic impact only for those outcomes found to be significantly 
influenced by the treatment in the ITT analysis. Impacts are highlighted in the accompanying tables as 
statistically significant if they exceed the 95 percent (one asterisk) or 99 percent (two asterisks) 
confidence levels, using a two-tailed significance test. 



4.2 Impacts on Student Achievement 

The statute clearly identifies students’ academic achievement as the primary outcome to be 
measured as part of the evaluation. This emphasis is consistent with the priority the Congress placed on 
having the OSP serve students from low-performing schools. Academic achievement as a measure of 
Program success is also well aligned with parents’ stated priorities in choosing schools (Wolf et ah, 2005, 
p. C-7). 



The primary analysis revealed no impacts of the Program, positive or negative, on student 
achievement in general after 1 year, although one of the sensitivity tests produced an estimated positive 
and statistically significant math impact. Among the subgroups examined, there were no statistically 
significant test score impacts on students who applied from SINI schools, students with lower 
performance at baseline, male students, female students, elementary or high school students, or students 
in either individual cohort. There may have been positive impacts on math achievement for participants 
who applied from non-SINI schools and students who applied to the Program with higher levels of 
academic performance; adjustments for multiple comparisons, however, suggested that those two initial 
findings might be false discoveries and, therefore, those results may not be reliable indicators of Program 
effects. 



Impacts for the Full Sample 

Overall, the main model indicated there were no statistically significant impacts of the 
Program on reading or math achievement in the first year. That is, the ITT analysis indicates that the 
outcome test scores of the treatment group, on average, were not significantly different from those of the 
control group in the first year (table 4-1). 



44 




Table 4-1. 



Year 1 Test Score ITT Impacts 







Regression-Based Impact Estimates 




Student Achievement 


Treatment 
Group Mean 


Control Group 
Mean 


Difference 

(Estimated 

Impact) 


Effect Size 


p-value 


Reading 


606.20 


605.18 


1.03 


.03 


.56 


Math 


595.61 


592.87 


2.74 


.08 


.07 



NOTE: Means are regression-adjusted using a consistent set of baseline covariates. Impacts are displayed in terms of scale 

scores. Effect sizes are displayed in terms of standard deviations of the study control group distribution. Valid N for 
reading = 1,649; math = 1,715. Separate reading and math sample weights used. 



While there were differences, none reach the 95 percent confidence level for statistical 
significance. This outcome can he viewed most clearly in figures 4-1 and 4-2. The confidence interval for 
the regression-adjusted difference between the treatment and control group in reading outcomes ranges 
from a negative 2.4 to a positive 4.5, and includes the value zero.^* Even though the estimate of the 
treatment impact on reading scale scores is about one point, it could plausibly lie anywhere within the 
interval; therefore, we do not know for sure if the reading impact is positive, zero, or negative. The same 
is true for the estimate of the treatment impact on math scale scores. The statistical estimate of the 
Program’s impact on math is a gain of 2.7 points; however, the actual impact could have been as high as 
5.7 or as low as -0.3. 




The mean and standard deviation (SD) for the norming population varies by grade and is 463.8 (SD=38.5) for kindergartners 
tested in the spring, compared to 652.1 (SD=39.1) for fifth graders and 703.6 (SD=36.5) for students in twelfth grade. 
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Figure 4-2. Regression-Adjusted impact: Math 




Subgroup Impacts 

The Program also appeared to have no clear impact on academic achievement in the first 
year for most of the policy-relevant subgroups of students examined (table 4-2). That is, there were no 
statistically significant differences between the treatment and control groups in reading or math test scores 
for students defined in the following ways: 

• Students who applied from a school designated SINI between 2003 and 2005; 

• Students who entered the Program with relatively low academic achievement in 
reading and math; 

• Males; 

• Females; 

• Students in either K-8 or in high school; and 

• Students in either cohort 1 or cohort 2. 

However, based on the main model estimates, the Program did appear to have an impact on test scores for 
students who applied with a relative advantage in academic preparation: 

• Students who had attended non-SINI public schools prior to the Program scored an 
average of 4.7 scale score points higher in math if they were in the treatment group; 
and 
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Table 4-2. Year 1 Test Score Differential ITT Regression Based Impact Estimates for Subgroups 



Student Achievement: 
Subgroups 






Reading 






Treatment 
Group Mean 


Control 
Group Mean 


Difference 

(Estimated 

Impact) 


Effect Size 


p-value 


SINI ever 


625.50 


625.74 


-.24 


-.01 


.92 


SINI never 


592.14 


590.10 


2.04 


.05 


.45 


Difference 


33.62 


35.64 


-2.27 


-.06 


.54 


Lower performance 


580.89 


582.48 


-1.59 


-.05 


.65 


Higher performance 


617.19 


614.75 


2.44 


.07 


.25 


Difference 


-36.30 


-32.27 


-4.03 


-.11 


.34 


Male 


607.08 


605.51 


1.56 


.04 


.55 


Female 


605.40 


604.88 


.52 


.01 


.84 


Difference 


1.68 


.64 


1.05 


.03 


.78 


K-8 


590.80 


589.30 


1.50 


.04 


.45 


9-12 


676.23 


677.33 


-1.10 


-.04 


.73 


Difference 


-85.44 


-88.03 


2.60 


.07 


.49 


Cohort 2 


591.77 


592.15 


-.38 


-.01 


.85 


Cohort 1 


659.13 


653.03 


6.10 


.20 


.11 


Difference 


-67.36 


-60.88 


-6.48 


-.18 


.14 



Math 

Difference 



Student Achievement: 
Subgroups 


Treatment 
Group Mean 


Control 
Group Mean 


(Estimated 

Impact) 


Effect Size 


/7-value 


SINI ever 


568.30 


568.10 


.20 


.01 


.93 


SINI never 


615.57 


610.89 


4.68* 


.12 


.04 


Difference 


-47.27 


-42.79 


-4.48 


-.13 


.17 


Lower performance 


576.07 


576.72 


-.66 


-.02 


.81 


Higher performance 


603.95 


599.66 


4.30* 


.12 


.03 


Difference 


-27.88 


-22.93 


-4.95 


-.14 


.16 


Male 


595.89 


594.61 


1.27 


.04 


.57 


Eemale 


595.43 


591.25 


4.18 


.12 


.06 


Difference 


.46 


3.36 


-2.90 


-.08 


.38 


K-8 


577.63 


574.86 


2.77 


.07 


.11 


9-12 


677.27 


674.67 


2.60 


.10 


.43 


Difference 


-99.64 


-99.81 


.17 


.00 


.96 


Cohort 2 


579.35 


576.16 


3.19 


.09 


.07 


Cohort 1 


655.32 


654.22 


1.10 


.04 


.74 


Difference 


-75.97 


-78.06 


2.09 


.06 


.58 



* Statistically significant at the 95 percent confidence level. 



NOTE: Means are regression-adjusted using a consistent set of baseline covariates. Impacts are displayed in terms of scale 

scores and effect sizes in terms of standard deviations. Valid for reading = 1,649; math = 1,715. Separate reading and 
math sample weights used. 
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• Students who entered the Program in the higher two-thirds of the test-score 
performance distribution — averaging 43 National Percentile Ranks in math at 
baseline — scored an average of 4.3 scale score points higher in math if they were in the 
treatment group. 

The regression-adjusted impact estimates show a modest test score gain for these two 
subgroups of study participants (figures 4-3 and 4-4). Based on the computed confidence interval, the 
difference in math achievement between the treatment and control group students who entered the 
Program from non-SINI schools could plausibly be as high as 9.1 scale score points or as low as .3 points. 
The treatment-control group difference for students entering with higher academic performance could, 
statistically, be as high as 8.1 points or as low as .5 points. In either case, both are statistically significant. 



Figure 4-3. Regression-Adjusted impact: SiNi- 




Figure 4-4. Regression-Adjusted impact: Higher 
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The magnitude of these math impacts are .12 standard deviations (SD) for SINI-never 
students and .12 SD for higher performance students. These effect sizes are in the lower end of the 
“moderate” range for test-score results (see Grissmer, Flanagan, Kawata, and Williamson, 2000, p. 59; 
Howell and Peterson et ah, 2002, p. 151; Krueger, 1999, p. 525). They also correspond closely to the size 
of the Minimum Detectahle Effects (MDE) for the subgroups in the analysis forecasted hy the power 
analysis (see appendix B). To place these estimated effect sizes in context, an effect of 0.12 of a standard 
deviation in math equates to a National Percentile Rank (NPR) difference of 3.05 NPR points. Because 
the control group was, on average, at the 35* percentile in math at haseline, a gain of 3.05 NPRs would 
bring its performance up to about the 38* percentile. Such a gain is likely to be considered modest but 
educationally meaningful. 



Accounting for Multiple Comparisons 

The estimates of academic achievement impacts on subgroups are an example of multiple 
comparisons between the treatment and control groups on a significant number of distinct but related 
samples from the study universe. When the Benjamini-Hochberg adjustment was applied, the statistically 
significant math impacts for students from non-SlNl schools and those who entered with higher levels of 
academic performance lost their statistical significance and, therefore, could be false discoveries. 



Sensitivity Checks 

As can be seen in table 4-3, the alternative specifications do not dramatically alter the overall 
findings for reading and math impacts, although the overall estimate of a positive impact in math does 
cross the threshold to be statistically significant when the analysis is limited to only the trimmed sample 
of respondents. The two statistically significant subgroup findings — for non-SINl and higher performing 
students in math — remain significant under models run with only the trimmed sample and with a limited 
number of covariates. However, both of these findings lose their statistical significance when analyzed 
with dummy variables in place of imputed data for missing baseline covariates. 



The standard deviation for the control group in math was 25.39571 NPRs. 
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Table 4-3. Year 1 Test Score ITT Regression-Based Impact Estimates and P-Values with 
Alternative Specifications 



Student 


Original Estimates 


Trimmed Sample 


Limited 

Covariates 

Estimate 


Without Imputed 
Data 


Groups 


Impact 


p-value 


Impact 


p-value 


Impact 


p-value 


Impact p-value 


Full sample: 
Reading 


1.03 


.56 


2.64 


.15 


.96 


.59 


.69 


.70 


Full sample: Math 


2.74 


.07 


3.43* 


.03 


2.67 


.08 


2.19 


.17 


Higher-performing : 
Math 


4.30* 


.03 


5.58** 


.00 


4.18* 


.03 


3.46 


.07 


SINI-never: Math 


4.68* 


.04 


6.39** 


.00 


3.83* 


.04 


3.55 


.11 



*Statistically significant at the 95 percent confidence level. 

**Statistically significant at the 99 percent confidence level. 

NOTES: Impacts are displayed in terms of scale scores. Valid for math = 1,715. Math sample weights used. 

In summary, the primary comparisons of the treatment and control groups regarding 
academic achievement revealed no significant differences 1 year after random assignment. One of the 
three alternative analyses did suggest a positive hut small programmatic impact in math. While subgroup 
analyses suggested possible impacts of the Program for students from non-SINI schools and those who 
entered with higher levels of academic performance, these may represent chance findings and need to be 
interpreted with caution. 

4.3 Impacts on Reported School Safety/Danger 

School safety is a valued feature of schools for the families who applied to the OSP. A total 
of 17 percent of cohort 1 parents at baseline listed school safety as their most important reason for seeking 
to exercise school choice — second only to academic quality among the available reasons (Wolf et ah, 
2005, p. C-7). A separate study of why and how OSP parents choose schools, which relied on focus group 
discussions with participating parents, found that school safety was among their most important 
educational concerns (Stewart, Wolf, and Cornman, 2005, p. v). 

Parent Self-Reports 

The parents of students offered an Opportunity Scholarship in the lottery subsequently 
reported their child’s school to be less dangerous than did the parents of students in the control group 
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(table 4-4). The impact of the Program on parental perceptions of school danger was -0.74 on a 10-point 
scale, an effect size of 0.22 standard deviations (see figure 4-5 for a visual display). This impact on 
parental concerns about school danger was largely consistent across various subgroups of students, 
including parents of students from SINI schools, parents of students who entered with lower levels of 
academic achievement, and parents of high school students. Only the parents of the small cohort 1 
subgroup of students reported no clear Program impacts on their perceptions of school danger. 

Table 4-4. Year 1 Parent Perceptions of School Danger: ITT Impacts for Full Sample and 
Subgroups 



Regression-Based Impact Estimates 
Difference 



School Danger: Parents 


Treatment 
Group Mean 


Control 
Group Mean 


(Estimated 

Impact) 


Effect Size 


p-value 


Full sample 


2.13 


2.87 


. 74** 


-.22 


.00 


SINI ever 


2.49 


3.29 


-.80** 


-.23 


.01 


SINI never 


1.87 


2.56 


-.69** 


-.21 


.01 


Difference 


.61 


.72 


-.11 


-.03 


.77 


Lower performance 


2.26 


3.17 


-.91* 


-.25 


.01 


Higher performance 


2.08 


2.74 


-.66** 


-.20 


.00 


Difference 


.18 


.43 


-.25 


-.07 


.54 


Male 


2.19 


2.82 


-.63* 


-.18 


.02 


Female 


2.07 


2.91 


- 84** 


-.25 


.00 


Difference 


.11 


-.10 


.21 


.06 


.55 


K-8 


1.90 


2.56 


-.66** 


-.20 


.00 


9-12 


3.16 


4.24 


-1.08* 


-.29 


.03 


Difference 


-1.26 


-1.67 


.42 


.12 


.43 


Cohort 2 


1.96 


2.74 


. 7g** 


-.23 


.00 


Cohort 1 


2.75 


3.34 


-.60 


-.18 


.19 


Difference 


-.78 


-.61 


-.18 


-.05 


.72 



*Statistically significant at the 95 percent confidence level. 

**Statistically significant at the 99 percent confidence level. 

NOTES: Means are regression-adjusted using a consistent set of baseline covariates. Effect sizes are in terms of standard 
deviations. Valid A^= 1,672. Parent survey weights used. 
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Figure 4-5. Group Means After Year 1 : 
Parent Perceptions of Schooi Danger 




'•Statistically significant at the 99 percent confidence level. 

NOTE: The means represent the number of items of concern on a 10 item index. 



The impacts of the Program on parental self-reports of school danger and disorder were 
statistically significant regarding half of the individual items that made up the danger index (see appendix 
I). Parents were significantly less likely to report school problems of property destruction, tardiness, 
truancy, fighting, and cheating if their child was in the treatment compared to the control group. The 
impacts of the Program on these parental self-reports of dangerous school conditions ranged from .11-. 24 
standard deviations. Treatment impacts were not statistically significant regarding parental school danger 
reports of racial conflict, weapons, drug dealing, drug or alcohol use, or teacher absenteeism. 



Accounting for Multiple Comparisons 

The Benjamini-Hochberg adjustments for multiple comparisons did not change the results 
regarding the impact of the Program on parental perceptions of school safety among the SINI-status, 
performance, gender, grade-level, and cohort subgroups tested (see appendix J). Even after adjustments 
for multiple comparisons, all subgroups of parents except those in cohort 1 demonstrated statistically 
significant reductions in their perceptions of school danger as a result of the Program. 
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Sensitivity Checks 



The programmatic impacts on parental reports of school danger were consistent across 
alternative analytic approaches (table 4-5). Regardless of how the data were analyzed, parents’ perception 
of school danger was significantly lower if their child was offered a scholarship. 



Table 4-5. Year 1 Parent Perceptions of School Danger ITT Regression-Based Impact Estimates 
and P-Values Under Alternative Specifications 





Original Estimates 


Trimmed Sample 


Limited 

Covariates 

Estimate 


Without Imputed 
Data 


Outcome 


Impact 


p-value 


Impact 


p-value 


Impact 


p-value 


Impact p-value 


School danger: 
parents 


_ 74 ** 


.00 


_ 72 ** 


.00 


-.69** 


.00 


-.72** .00 



**Statistically significant at the 99 percent confidence level. 
NOTES: Valid N = 1,672. Parent survey weights used. 



In summary, the year 1 outcome data reveal a suhstantially large and statistically significant 
difference between parental self-reports of dangerous or disorderly conditions at their child’s school 
depending on whether the child received a scholarship. Treatment group parents were significantly less 
likely to report serious concerns about school danger compared to control group parents. This 
programmatic impact on school danger was evident among every subgroup of participants analyzed 
except for the parents of cohort 1 students. The estimates of the school danger impacts of the Program, in 
general and for subgroups, were not measurably affected by adjustments for multiple comparisons or 
alternative analytic approaches. 



Student Self-Reports 

The students in grades 4-12 who completed surveys paint a somewhat different picture about 
dangerous activities at their school than do their parents. The student index of school danger asked 
students if they personally had been a victim of theft, drug-dealing, assaults, threats, bullying, or taunting 
or had observed weapons at school. On average, reports of danger by students offered scholarships 
through the lottery were similar to those of the control group. The results did not differ among or across 
the subgroups analyzed (table 4-6, figure 4-6). (See appendix I for a detailed table with the individual 
items for the full sample.) 



53 




Table 4-6. Year 1 Student Perceptions of School Danger: ITT Impacts for Full Sample and 
Subgroups 



Regression-Based Impact Estimates 
Difference 



School Danger: Students 


Treatment 
Group Mean 


Control 
Group Mean 


(Estimated 

Impact) 


Effect Size 


p-value 


Full sample 


1.89 


2.09 


-.21 


-.11 


.22 


SINI ever 


2.02 


2.25 


-.24 


-.12 


.30 


SINI never 


1.80 


1.98 


-.18 


-.10 


.44 


Difference 


.22 


.27 


-.05 


-.03 


.86 


Lower performance 


2.07 


2.13 


-.06 


-.03 


.84 


Higher performance 


-1.82 


2.08 


-.26 


-.14 


.19 


Difference 


.25 


.05 


.20 


.10 


.57 


Male 


2.09 


2.40 


-.31 


-.15 


.21 


Female 


1.69 


1.80 


-.11 


-.06 


.62 


Difference 


.40 


.60 


-.20 


-.10 


.53 


4-8 


1.99 


2.18 


-.19 


-.10 


.34 


9-12 


1.44 


1.73 


-.29 


-.16 


.19 


Difference 


.55 


.45 


.10 


.05 


.73 


Cohort 2 


1.96 


2.19 


-.24 


-.12 


.25 


Cohort 1 


1.65 


1.76 


-.10 


-.06 


.66 


Difference 


.30 


.44 


-.13 


-.07 


.66 



NOTES: Means are regression-adjusted using a consistent set of baseline covariates. Effect sizes are in terms of standard 
deviations. Valid A^= 968. Student survey weights used. Survey given to students in grades 4-12. 



Figure 4-6. Group Means After Year 1 : 
Student Perceptions of Schooi Danger 

4 

3 



1 

0 

Treatment Control 

NOTE: The means represent the number of incidents on an 8-item index. 
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Accounting for Multiple Comparisons 



Since the Benjamini-Hochberg test for multiple comparisons generates more conservative 
estimates, it did not sharply change the already non-statistically significant results regarding the impact of 
the Program on student reports of school danger among the SINI-status, performance, gender, grade-level, 
and cohort subgroups tested. 



Sensitivity Checks 

The results of student reports of school danger were consistent across alternative analytic 
approaches (table 4-7). Regardless of how the data were analyzed, responses of those offered a 
scholarship did not differ significantly from control group students’ perception of school danger. 



Table 4-7. Year 1 Student Perceptions of School Danger ITT Regression-Based Impact Estimates 
and P-Values Under Alternative Specifications 





Original Estimates 


Trimmed Sample 


Limited 

Covariates 

Estimate 


Without Imputed 
Data 


Outcome 


Impact 


p-value 


Impact 


p-value 


Impact 


p-value 


Impact p-value 


School danger: 
students 


-.21 


.22 


-.21 


.24 


-.19 


.27 


-.24 .16 



Valid A?= 968. Student survey weights used. Survey given to students in grades 4-12. 



4.4 Impacts on School Satisfaction 

Economists have long used customer satisfaction as a proxy measure for product or service 
quality (see Johnson and Fornell, 1991). While not specifically identified as an outcome to be studied, it 
is an indicator of the “success of the Program in expanding options for parents,” which the Congress 
asked the evaluation to consider (see Section 309 of the District of Columbia School Choice Incentive Act 
of 2003). Satisfaction is also an outcome studied in the previous evaluations of school scholarship 
programs, all of which concluded that parents tend to be significantly more satisfied with their child’s 
school if they have had the opportunity to select it (see Greene, 2001, pp. 84-85). 
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Parent Self-Reports 



Seven months after the start of their experience with the OSP, parents are more satisfied with 
their child’s school if they were offered a scholarship (table 4-8, figures 4-7 through 4-9). Three different 
measures of parent satisfaction all show large statistically significant positive impacts of the Program on 
parental evaluations of their child’s school: 

• A total of 74 percent of treatment parents assigned their child’s school a grade of A or 
B compared with 55 percent of control parents — a difference of 19 percentage points. 

• On a scale of A-F, the average grade assigned to the school hy parents of treatment 
students was nearly one -half grade higher than that of control parents — a statistically 
significant difference. 

• Parents of students in the treatment group scored an average of more than three points 
higher than parents of students in the control group on the school satisfaction 
index — again a statistically significant programmatic impact. 

The magnitude of these positive OSP impacts on parental satisfaction (the effect sizes) 
ranged from .37 to .38 standard deviations. The parent satisfaction scale comprised 12 separate items 
asking how dissatisfied or satisfied they were with a variety of characteristics of their child’s school 
including location, academics, teachers, facilities, safety, communication, parental support, etc. Each of 
the 12 items was rated on a 4-point scale. Using Item Response Theory (IRT) techniques, a summary 
scale was constructed with a range of .55 to 35.54. Positive statistically significant treatment impacts were 
observed for each of the 12 individual components of the parental school satisfaction scale (see appendix 
I for a detailed table with the individual items). 
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Table 4-8. Year 1 Parental Satisfaction ITT Impacts 



Regression-Based Impact Estimates 



Outcome 


Treatment 
Group Mean 


Control Group 
Mean 


Difference 

(Estimated 

Impact) 


Effect Size 


p-value 


Parents who gave school a 
grade of A or B 


.74 


.55 




.38 


.00 


Average grade parent gave 
school (5.0 scale) 


3.98 


3.57 




31 


.00 


School satisfaction scale 


25.88 


22.73 


3.15** 


31 


.00 



**Statistically significant at the 99 percent confidence level. 



NOTES: Means are regression- adjusted using a consistent set of baseline covariates. Effect sizes are in terms of standard 
deviations. Valid Vfor school grade = 1,675; parent satisfaction = 1,686. Parent survey weights used. Impact estimates 
reported for the dichotomous variable “parents who gave school a grade of A or B” are reported as marginal effects. 



Figure 4-7. Group Means After Year 1 : Percentage 
of Parents Who Gave School Grade A or B 

80% n 74% 




Treatment** Control 

**Statistically significant at the 99 percent confidence level. 



Figure 4-8. Group Means After Year 1 : Average 
Grade Parent Gave School 



3.98 




3.57 




Treatment** Control 

'*Statistically significant at the 99 percent confidence level. 



I 30 - 



I 20 - 



Figure 4-9. Group Means After Year 1 : Parent 
School Satisfaction Scale 



25.88 






22.73 

■ 



0 ^ 

Treatment** Control 

‘"Statistically significant at the 99 percent confidence level. 
NOTE: Scale range of .55 - 35.54. 



The impact of the Program on parental satisfaction was positive and consistent across the 
various subgroups of participants, with some notable exceptions (table 4-9). Specifically: 



• Although parents of students who had attended a SINI school were more likely to 
grade their child’s school A or B if they had been offered a scholarship, the impact of 
the Program on that outcome was somewhat lower for them (11 percentage point 
average gain) compared with parents of non-SINI students offered a scholarship (26 
percentage point average gain). 

• Parents of scholarship students in grades K-8 reported a much higher impact on the 
grade they assigned to their child’s school than did parents of scholarship students in 
high school. 

• The likelihood of a parent of a high school student grading his/her child’s school A or 
B and the average grade that parents of high school students gave their child’ s school 
did not differ significantly as a result of the treatment. 

• Although parents of both male and female students scored higher on the satisfaction 
scale if their child had been offered a scholarship, the impact on satisfaction was 
significantly higher for the parents of girls (over four points) than for the parents of 

boys (two points). 
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Table 4-9. Year 1 Parent Satisfaction Differential ITT Impacts for Subgroups 









Difference 








Treatment 


Control 


(Estimated 






Subgroups 


Group Mean 


Group Mean 


Impact) 


Effect Size 


p-value 


Parents Who Gave Their School a Grade of A or B 


SINI ever 


.63 


.53 


.11* 


.22 


.01 


SINI never 


.83 


.57 


.26** 


.52 


.00 


Difference 


-.19 


-.04 


-.16* 


-.32 


.01 


Lower performance 


.67 


.50 


.18** 


.36 


.00 


Higher performance 


.77 


.57 


.19** 


.39 


.00 


Difference 


-.09 


-.08 


-.01 


-.03 


.81 


Male 


.71 


.53 


.18** 


.36 


.00 


Female 


.77 


.57 


.20** 


.40 


.00 


Difference 


-.06 


-.04 


-.02 


-.04 


.75 


K-8 


.78 


.56 


.22** 


.44 


.00 


9-12 


.57 


.52 


.05 


.11 


.38 


Difference 


.20 


.04 


.16* 


.33 


.01 


Cohort 2 


.76 


.56 


.20** 


.40 


.00 


Cohort 1 


.66 


.50 


.16** 


.32 


.01 


Difference 


.10 


.06 


.04 


.07 


.60 


Average Grade Parent Gave School (5.0 Scale) 


SINI ever 


3.79 


3.46 


.33** 


.28 


.00 


SINI never 


4.13 


3.65 


.48** 


.45 


.00 


Difference 


-.35 


-.20 


-.15 


-.14 


.23 


Lower performance 


3.84 


3.37 


.46** 


.40 


.00 


Higher performance 


4.04 


3.65 


39** 


.36 


.00 


Difference 


-.21 


-.28 


.07 


.07 


.59 


Male 


3.92 


3.53 


39** 


.35 


.00 


Female 


4.04 


3.60 


44** 


.39 


.00 


Difference 


-.12 


-.07 


-.05 


-.04 


.68 


K-8 


4.05 


3.58 


47** 


.42 


.00 


9-12 


3.67 


3.51 


.16 


.15 


.21 


Difference 


.38 


.08 


.30* 


.27 


.04 


Cohort 2 


4.03 


3.61 


.41** 


.37 


.00 


Cohort 1 


3.83 


3.41 


42** 


.38 


.00 


Difference 


.20 


.21 


-.01 


-.01 


.96 


School Satisfaction Scale 


SINI ever 


24.60 


21.74 


2.86** 


.34 


.00 


SINI never 


26.82 


23.45 


3.37** 


.40 


.00 


Difference 


-2.22 


-1.71 


-.51 


-.06 


.58 


Lower performance 


25.30 


21.66 


3.64** 


.41 


.00 


Higher performance 


26.12 


23.18 


2.94** 


.36 


.00 


Difference 


-.82 


-1.52 


.70 


.08 


.48 
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Table 4-9. Year 1 Parent Satisfaction Differential ITT Impacts for Subgroups (continued) 



Subgroups 


Treatment 
Group Mean 


Control 
Group Mean 


Difference 

(Estimated 

Impact) 


Effect Size 


p-value 


School Satisfaction Scale (cont’d) 


Male 


25.46 


23.40 


2.06** 


.26 


.00 


Female 


26.33 


22.14 


4.19** 


.48 


.00 


Difference 


-.87 


1.26 


-2.13* 


-.25 


.02 


K-8 


26.26 


23.01 


3.25** 


.39 


.00 


9-12 


24.12 


21.44 


2.68** 


.31 


.01 


Difference 


2.14 


1.57 


.57 


.07 


.63 


Cohort 2 


26.14 


22.99 


3.15** 


.38 


.00 


Cohort 1 


24.92 


21.76 


3.16** 


.36 


.00 


Difference 


1.22 


1.23 


-.01 


-.00 


.99 



*Statistically significant at the 95 percent confidence level. 



**Statistically significant at the 99 percent confidence level. 

NOTES: Means are regression-adjusted using a consistent set of baseline covariates. Effect sizes are in terms of standard 
deviations. Valid Vfor school grade = 1,675; parent satisfaction = 1,686. Parent survey weights used. Impact estimates 
reported for the dichotomous variable “parents who gave school a grade of A or B” are reported as marginal effects. 



Because the differences in satisfaction impacts across the subgroups were not entirely 
consistent across the alternative measures of satisfaction, readers are cautioned against drawing strong 
conclusions about certain subgroups of treatment parents being more or less satisfied with their child’s 
schools. 



Accounting for Multiple Comparisons 

The adjustment for multiple comparisons had little effect on the statistical significance of the 
subgroup impacts on parent satisfaction (see appendix J). Although a total of 45 comparisons were 
made — 15 comparisons for each of three satisfaction measures — the pattern of positive programmatic 
impacts was sufficiently pronounced that almost all of the results that were statistically significant 
individually remained significant once adjustments were made for multiple comparisons. The only 
exception was the difference in the impact of the Program on the average grade given by parents of K-8 
versus high school students — a significant difference across those two subgroups of treatment members 
that may be a false discovery based on the Benjamini-Hochberg adjustment. 
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Sensitivity Checks 



The positive impact of the Program on parent satisfaction was not sensitive to alternative 
analytic approaches (table 4-10). The impact estimates for the three different measures of satisfaction 
were all highest when the trimmed sample of observations was used and lowest when missing data 
variables were used in place of imputed data. However, across all alternative specifications of the parent 
satisfaction impacts, parents self-reported significantly higher levels of school satisfaction if their child 
had been awarded a scholarship. 



Table 4-10. Year 1 Parent Satisfaction ITT Regression-Based Impact Estimates and P-Values 
with Alternative Specifications 





Original Estimates 


Trimmed Sample 


Limited 

Covariates 

Estimate 


Without Imputed 
Data 


Outcome 


Impact 


p-value 


Impact 


p-value 


Impact 


p-value 


Impact 


p-value 


Parents who gave 
school a grade of A 
or B 




.00 


.20** 


.00 


ig^^ 


.00 


j7** 


.00 


Average grade 
parent gave school 
(5.0 scale) 




.00 


43** 


.00 


41** 


.00 


.38** 


.00 


School satisfaction 
scale 


3.15** 


.00 


3.44** 


.00 


3.13** 


.00 


2.98** 


.00 



**Statistically significant at the 99 percent confidence level. 



NOTES: Valid N for school grade = 1,675; parent satisfaction = 1,686. Parent survey weights used. Impact estimates reported 
for the dichotomous variable “parents who gave school a grade of A or B” are reported as marginal effects. 



Student Self-Reports 

As was true with the dangerous activity measures, students had a different view of their 
satisfaction with their schools than did their parents. Seven months into their first school year after 
applying to the OSP, the responses of members of the treatment group did not differ significantly from 
those of the control group regarding school satisfaction (table 4-11).®“ Specifically, there was no impact 
from the Program on students’ likelihood of assigning their school a grade of A or B, the average grade 
they assigned their school, or their reports of satisfaction with their school (figures 4-10 through 4-12). 
(See appendix I for a detailed table with the individual items.) 



Only students in grades 4-12 were administered surveys, so the satisfaction of students in early elementary grades is unknown. 
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Table 4-11. Year 1 Student Satisfaction ITT Impacts 



Outcome 


Treatment 
Group Mean 


Control 
Group Mean 


Difference 

(Estimated 

Impact) 


Effect Size 


p-value 


Students who gave school a 


.70 


.72 


-.01 


-.03 


.74 


grade of A or B 
Average grade student gave 


3.99 


3.99 


.00 


.00 


.99 


school (5.0 scale) 

School satisfaction scale 


33.74 


33.56 


.38 


.06 


.48 



NOTES: Means are regression-adjusted using a consistent set of baseline covariates. Effect sizes are in terms of standard 
deviations. Valid N for school grade = 901; student satisfaction = 983. Student survey weights used. Impact estimates 
reported for the dichotomous variable “students who gave school a grade of A or B” are reported as marginal effects. 
Survey given to students in grades 4-12. 



Figure 4-10. Group Means After Year 1 : Percentage 
of Students Who Gave School Grade A or B 



70% 



72% 



I 60% 




I 20 % 



Treatment 



Control 



Figure 4-1 1 . Group Means After Year 1 : Average 
Grade Student Gave School 





Figure 4-12. Group Means After Year 1: Student 
School Satisfaction Scale Rating 




Treatment Control 



NOTE: Scale range of 8.44 - 46.45. 



There were some differences across subgroups of students, however. In particular, there 
were negative impacts on satisfaction for students from SINI schools and for students who entered the 
Program with relative academic disadvantages (table 4-12): 



• Among students from SINI schools, those awarded scholarships were more likely to 
grade their school poorly than were those in the control group. In contrast, among 
students from non-SINI schools, those awarded scholarships were more likely to grade 
their school highly than were those in the control group. Neither subgroup impact was 
statistically significant; however, the difference in the treatment impact between 
students from SINI and non-SINI schools was itself statistically significant. 

• Among students who applied to the Program with lower academic achievement, those 
awarded a scholarship were more likely to grade their school poorly than were those in 
the control group. This subgroup impact was statistically significant. The difference in 
the impact on satisfaction for lower achievement students versus higher achievement 
students also was itself statistically significant. 



61 




Table 4-12. Year 1 Student Satisfaction Differential ITT Impacts for Subgroups 









Difference 








Treatment 


Control 


(Estimated 






Subgroups 


Group Mean 


Group Mean 


Impact) 


Effect Size 


p-value 


Students Who Gave Their School a Grade of A or B 


SINI ever 


.61 


.71 


-.10 


-.22 


.06 


SINI never 


.79 


.72 


.07 


.15 


.26 


Difference 


-.18 


-.02 


-.18* 


-.40 


.04 


Lower performance 


.60 


.81 


-.22** 


-.43 


.00 


Higher performance 


.75 


.68 


.06 


.12 


.21 


Difference 


-.15 


.13 


_ 32 ** 


-.72 


.00 


Male 


.69 


.73 


-.04 


-.09 


.47 


Female 


.72 


.71 


.01 


.02 


.84 


Difference 


-.04 


.01 


-.05 


-.12 


.52 


4-8 


.76 


.74 


.02 


.04 


.72 


9-12 


.48 


.60 


-.12 


-.24 


.06 


Difference 


.28 


.15 


.13 


.29 


.08 


Cohort 2 


.73 


.75 


-.02 


-.05 


.69 


Cohort 1 


.61 


.61 


.00 


.01 


.96 


Difference 


.12 


.14 


-.02 


-.05 


.76 


Average Grade Student Gave School (5.0 Scale) 


SINI ever 


3.77 


3.85 


-.08 


-.07 


.55 


SINI never 


4.15 


4.08 


.07 


.07 


.57 


Difference 


-.37 


-.23 


-.14 


-.14 


.41 


Lower performance 


3.84 


4.17 


-.34* 


-.38 


.02 


Higher performance 


4.05 


3.92 


.13 


.12 


.24 


Difference 


-.21 


.25 


-.46** 


-.46 


.01 


Male 


3.94 


3.98 


-.05 


-.05 


.70 


Female 


4.04 


3.99 


.05 


.05 


.70 


Difference 


-.10 


-.01 


-.09 


-.09 


.58 


4-8 


4.12 


4.09 


.04 


.04 


.71 


9-12 


3.41 


3.57 


-.16 


-.16 


.30 


Difference 


.71 


.52 


.20 


.20 


.27 


Cohort 2 


4.06 


4.11 


-.05 


-.06 


.61 


Cohort 1 


3.76 


3.58 


.18 


.17 


.23 


Difference 


.30 


.53 


-.24 


-.24 


.19 


School Satisfaction Scale 


SINI ever 


32.54 


32.21 


.33 


.05 


.70 


SINI never 


34.59 


34.17 


.42 


.07 


.54 


Difference 


-2.05 


-1.96 


-.10 


-.01 


.93 


Lower performance 


32.73 


32.15 


.58 


.12 


.48 


Higher performance 


34.09 


33.79 


.30 


.04 


.66 


Difference 


-1.36 


-1.64 


.28 


.04 


.80 
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Table 4-12. Year 1 Student Satisfaction Differential ITT Impacts for Subgroups (continued) 



Subgroups 


Treatment 
Group Mean 


Control 
Group Mean 


Difference 

(Estimated 

Impact) 


Effect Size 


p-value 


School Satisfaction Scale (cont’d) 


Male 


33.82 


33.16 


.66 


.10 


.38 


Female 


33.67 


33.55 


.12 


.02 


.87 


Difference 


.15 


-.39 


.55 


.08 


.60 


4-8 


34.03 


33.57 


.46 


.07 


.46 


9-12 


32.47 


32.43 


.03 


.01 


.97 


Difference 


1.56 


1.14 


.42 


.07 


.70 


Cohort 2 


33.70 


33.51 


.19 


.03 


.75 


Cohort 1 


33.91 


32.84 


1.07 


.14 


.31 


Difference 


-.21 


.66 


-.88 


-.13 


.46 



*Statistically significant at the 95 percent confidence level. 



**Statistically significant at the 99 percent confidence level. 

NOTES: Means are regression-adjusted using a consistent set of baseline covariates. Effect sizes are in terms of standard 

deviations. Valid N for school grade = 901; student satisfaction = 983. Student survey weights used. Impact estimates 
reported for the dichotomous variable “students who gave school a grade of A or B” are reported as marginal effects. 
Survey given to students in grades 4-12. 



Accounting for Multiple Comparisons 

Adjustments for multiple comparisons did suggest that some of the negative subgroup 
impacts on student satisfaction may he false discoveries (see appendix J). The initial finding that the 
impact of the Program on the prohahility of students assigning their schools a grade of A or B was 
different for SINI-ever (negative) than SINI-never (positive) participants was no longer statistically 
significant after the adjustment. Similarly, the initial result suggesting a negative impact of the Program 
on the likelihood of high school students assigning their school an A or B could he a false discovery once 
the effect of multiple comparisons is factored into the significance test. The initial findings that lower 
performing students were less likely to grade their school A or B and assigned lower overall grades if they 
were offered scholarships remained statistically significant even after adjustments for multiple 
comparisons. The finding that this negative impact of the Program on student satisfaction was different 
for lower performing students than it was for higher performing students, who on average did not grade 
their schools more negatively if offered a scholarship, also remained statistically significant after 
adjustments for multiple comparisons. 
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Sensitivity Checks 



The main findings of no programmatic impact on overall student self-reports of satisfaction 
were consistent across the three alternative methodological approaches (table 4-13). There are no 
differences in the average student satisfaction levels of the overall treatment and control group after 1 
year regardless of the satisfaction measure or analytic method used. 



Table 4-13. Year 1 Student Satisfaction ITT Regression-Based Impact Estimates and P-Values with 
Alternative Specifications 





Original Estimates 


Trimmed Sample 


Limited 

Covariates 

Estimate 


Without Imputed 
Data 


Outcome 


Impact 


p-value 


Impact 


p-value 


Impact 


p-value 


Impact 


p-value 


Students who gave 
school a grade of A 
or B 


-.01 


.74 


.01 


.86 


-.01 


.74 


-.02 


.64 


Average grade 
student gave school 
(5.0 scale) 


.00 


.99 


.04 


.63 


-.00 


.97 


.01 


.92 


School satisfaction 
scale 


.38 


.48 


.49 


.37 


.27 


.62 


.42 


.43 



NOTES: Valid A? for school grade = 901; student satisfaction = 983. Student survey weights used. Impact estimates reported for 
the dichotomous variable “students who gave school a grade of A or B” are reported as marginal effects. Survey given 
to students in grades 4-12. 



4.5 Summary of Experimental Impacts 

One year after the students of cohort 1 and cohort 2 applied to the OSP and 7 months after 
they began their post-randomization educational experiences, those in the treatment group appear to be 
performing in mathematics and reading at a level comparable to the control group. The primary method of 
analysis and two of the three alternative methods indicated no statistically significant achievement gains 
attributable to the Program. The remaining alternative method, using the trimmed response sample, 
suggested an overall treatment impact of 3.4 scale score points in math that was statistically significant. 
Two subgroups of study participants — students who were non-SINI applicants and students who were 
higher performing when they applied — may have benefited from the Program in terms of their math 
achievement; however, adjustments that account for the fact that these two positive findings emerged 
from multiple comparisons suggest that they might be false discoveries, so they should be interpreted 
with caution. 
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The data indicate that parents are much more satisfied with their child’s school and view it 
as safer in many ways if they were offered an Opportunity Scholarship. These parent satisfaction impacts 
are positive, large, and consistent across almost all subgroups and not sensitive to adjustments for 
multiple comparisons or alternative analytic approaches. However, the Program had no impact overall on 
students’ views of school safety and satisfaction; students who entered the Program with lower academic 
achievement graded their schools lower if they had been offered a scholarship. 

These results can be placed in the context of other RCTs of scholarship programs for low- 
income students, which suggest no consistent pattern of academic achievement impacts for the first year 
of program participation. Among such evaluations of four privately funded scholarship programs, one 
study of the Charlotte, North Carolina, program clearly found statistically significant overall impacts on 
math and reading for the first year, while one of three analyses of the New York City program found 
overall impacts on math achievement (Barnard et ah, 2003; Greene, 2000). When African-Americans are 
considered separately — a group that makes up nearly 90 percent of the OSP impact study sample — two of 
three analyses of the New York City program suggest there were achievement gains in math for African- 
American students in some grade levels (Barnard et ah, 2003; Mayer et ah, 2002), but studies of the 
Dayton, Ohio, and earlier District of Columbia programs found no impacts for this group until students 
were in the program for 2 years (Howell et ah, 2002). In contrast, all of the RCTs that measured parent 
satisfaction and perceptions of school safety found positive impacts similar to those demonstrated by the 
OSP the first year (Greene, 2000; Howell and Peterson et ah, 2002;). 
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5. The Effects of OSP Scholarship Use and Private Schooling 



The previous chapter descrihed the experimental impacts from the Program, observed about 
a year after students applied to it. Those results provide initial answers to the question: “What happened 
to qualified applicants who were offered Opportunity Scholarships?” However, as outlined elsewhere in 
the report, there are related questions about Program effects that are also of interest to policymakers. The 
two most important are addressed here, and not in the prior chapter, because they deviate to different 
extents from the randomized trial that grounds the rigor of the evaluation (see chapter 3 for a discussion 
of methodology and appendix G for the specific statistical techniques and models used here). This chapter 
presents information on the estimated effects of (1) “using” a scholarship to attend a participating private 
school (what we call the impact on the treated or lOT) and (2) attending a private school, regardless of 
scholarship use, with the latter based on statistical techniques that make the estimate the best proxy for, 
but not, an experimental impact. Each section starts off with the context and rationale for the analysis and 
then describes the analysis results. 



5.1 Effect of Using a Scholarship 

The intention-to-treat (ITT) impacts discussed in chapter 4 describe the average impacts of 
the OSP on the entire group of students who were selected randomly in the lottery to receive the offer of a 
scholarship. However, about one-fifth of these treatment students did not use their scholarship the first 
year. Since the rate of treatment nonuse will vary over time and across scholarship programs, 
policymakers have expressed an interest in understanding the impact of the OSP on the students who 
actually used their scholarships to attend participating private schools (the lOT estimate). 



Interpreting the Impacts on the Treated (lOT) 

As described in chapter 3, estimating the impact of using an OSP scholarship involves 
“netting out” from the ITT results two groups of students: (1) the 20 percent who received but failed to 
take up the scholarship offer and who, therefore, presumably, had zero impact from the Program, and (2) 
the hypothesized 4 percent who never received a scholarship offer but who, by virtue of having a sibling 
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with an OSP scholarship, wound up in a participating private school. Both adjustments have the effect of 
increasing the size of any observed statistically significant ITT impacts of the Program. 

These statistical procedures for estimating the effects of scholarship usage do not change the 
treatment- or control-group status of any participants. Bloom adjustments draw upon the existing patterns 
in the experimental data to generate unbiased estimates of what the programmatic effects were for the 
treatment members who actually used their scholarship compared with the counterfactual provided by the 
control group. The number of “noncompliers” — treatment “decliners” and control “crossovers” — factor 
into these statistical estimates, but treatment decliners are never mixed with the control group nor are 
control crossovers considered treatment-group members for purposes of these calculations. The findings 
from the purely experimental ITT analysis are simply taken from the entire treatment group and rescaled 
across the subgroup of treatment users (i.e.. Bloom adjusted). The double-Bloom adjustments presented 
here merely further adjust mathematically for the somewhat anomalous cases of Program-induced control 
crossovers that emerged in this particular RCT. 

When estimating the impact of scholarship use (lOT), the ITT results serve an important 
screening function. If an impact is not statistically significant in the ITT stage of the analysis, it makes 
little sense to consider what its lOT effects might be. Experimental impacts that fail to attain an 
acceptable level of statistical significance are considered to be impacts of zero; that is, an impact of zero 
rescaled by way of a Bloom or double-Bloom adjustment will remain zero. 

In short, a finding must emerge as statistically significant in the experimental evaluation of 
the impact of the scholarship offer in order for there to be any hope of that finding shedding light on the 
question of the effects of actually using a scholarship if offered one. As a result, we only present lOT 
effect estimates for Program impacts found to be statistically significant in the ITT analysis reported in 
chapter 4. The adjustments for multiple comparisons and sensitivity test results reported for the 
statistically significant impacts in chapter 4 apply directly to the rescaled lOT results presented here. We 
use the same full sample of year 1 outcome observations used in the ITT analysis for the subsequent 
analyses here. 



lOT Effects on Achievement 

The ITT analysis presented in chapter 4 found no evidence of an overall impact of the first 
year of the Program on student test scores in either reading or math. The effect of using a scholarship on 
these outcomes is similar. 
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However, there may have been ITT impacts on math scores for two subgroups: students who 
applied from non-SINI schools and those who entered the Program with relatively higher levels of 
academic achievement. To begin the process of computing the effects of using a scholarship, those impact 
estimates can be rescaled (table 5-1) to reflect the fact that only 80 percent of the treatment group was 
actually using a scholarship, resulting in 

• An estimated math impact of 5.8 scale score points for non-SINI treatment users; and 

• An estimated math impact of 5.4 scale score points for higher performing treatment 
users. 

The second step in the estimation nets out the hypothesized program-induced crossover of 
the 4 percent of control group members who “piggy-backed” onto their treatment siblings to gain free 
admission to private schools. This leads to slightly higher estimates of the few initially significant 
subgroup treatment effects: 

• An estimated math impact of 6.1 scale score points for non-SINI scholarship users; and 

• An estimated math impact of 5.6 scale score points for higher performing scholarship 
users. 



Table 5-1. lOT Achievement Estimates for Statistically Significant Subgroup Impacts on 
Treatment Users 



Student Achievement 
Groups 


Original ITT Estimates 


Usage 

Rate 


Single 

Bloom 

Adjustment 


Program- 

Enabled 

Crossover 


Double 

Bloom 

Adjustment 


Impact 


p-value 


SINI never: Math 


4.68* 


.04 


80% 


5.84* 


4% 


6.12* 


Higher performance: Math 


4.30* 


.03 


80% 


5.37* 


4% 


5.62* 



*Statistically significant at the 95 percent confidence level. 

NOTES: Valid N for math = 1,715. Math sample weights used. Impacts are displayed in terms of scale scores. 



The sizes of the year 1 math programmatic effects on certain subgroups of treatment users 
are (table 5-2) 

• .17 standard deviations (SD) for SINTnever treatment users (single Bloom 
adjustment), 

• .18 SD for SINI-never treatment users accounting for Program-induced crossovers 
(double Bloom adjustment), 

• .16 SD for higher performance treatment users (single Bloom adjustment), and 
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.17 SD for higher performance treatment users accounting for Program-induced 
crossovers (double Bloom adjustment). 



These effect sizes are in the general range of moderate magnitude for test-score results (Grissmer et al., 
2000, p. 59; Howell and Peterson et al., 2002, p. 151; Krueger, 1999, p. 525). 

Table 5-2. Effect Sizes for Statistically Significant Subgroup Impacts on 
Treatment Users 



Student Achievement 


Original ITT 


Single Bloom 


Double Bloom 


Groups 


Estimates 


Adjustment 


Adjustment 


SlNl never: Math 


.12 


.17 


.18 


Higher performance: Math 


.12 


.16 


.17 



Adjustments for Multiple Comparisons and Sensitivity Checks 

The statistical adjustments for multiple comparisons conducted for the ITT analysis apply 
directly to the Bloom and double Bloom rescaling of the ITT impacts presented here. Just as the 
Benjamini-Hochberg adjustments suggested that the two subgroup ITT impacts in math might be false 
discoveries, the same applies to the mathematical rescaling of those original findings. The sensitivity tests 
conducted on the math subgroup impacts discussed in chapter 4 also apply directly here, suggesting that 
the effects are not sensitive to alternative analytic approaches. 



lOT Effects on Parental Perceptions of School Safety/Danger 

The ITT analysis also revealed statistically significant Program impacts on parent reports of 
the level of safety at their child’s school, measured by an index of parental perceptions of danger. To 
estimate the effect of the Program on parental perceptions of danger for scholarship users, the ITT 
impacts are rescaled using Bloom and double Bloom calculations (table 5-3). The results are estimated to 
be average reductions on the 10-point parental perception of school danger index of 

• .92 as a result of using a scholarship (single Bloom); and 

• .97 as a result of using a scholarship, factoring in Program-induced crossover (double 
Bloom). 
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